SQL — The Relational Language › UB › SR › databases › Simovici_SQL.pdf · 66 SQL — The...

Chapter 5

SQL — The Relational

Language

5 SQL — The Relational Language 63

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.2 Tabular Variables in SQL . . . . . . . . . . . . . . . . . . . . . . 65

5.2.1 Creation of Tables . . . . . . . . . . . . . . . . . . . . . . 665.3 Referential Integrity in SQL . . . . . . . . . . . . . . . . . . . . . 705.4 Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.4.1 String Domains . . . . . . . . . . . . . . . . . . . . . . . . 725.4.2 Numeric Domains . . . . . . . . . . . . . . . . . . . . . . 725.4.3 Special Domains . . . . . . . . . . . . . . . . . . . . . . . 735.4.4 Basic Domains Supported by ORACLE . . . . . . . . . . 73

5.5 SELECT Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.6 The WHERE Option . . . . . . . . . . . . . . . . . . . . . . . . . 775.7 Union, Intersection, and Difference in SQL . . . . . . . . . . . . . 825.8 Table Product in SQL . . . . . . . . . . . . . . . . . . . . . . . . 845.9 Join in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.10 Sets and subqueries . . . . . . . . . . . . . . . . . . . . . . . . . . 885.11 Parametrized subqueries . . . . . . . . . . . . . . . . . . . . . . . 915.12 Subqueries and division . . . . . . . . . . . . . . . . . . . . . . . 935.13 Relational Completeness of SQL . . . . . . . . . . . . . . . . . . 955.14 Scalar Functions of SQL . . . . . . . . . . . . . . . . . . . . . . . 96

5.14.1 Numerical Functions . . . . . . . . . . . . . . . . . . . . . 965.14.2 String Functions . . . . . . . . . . . . . . . . . . . . . . . 975.14.3 Date functions . . . . . . . . . . . . . . . . . . . . . . . . 100

5.15 Aggregate Functions in SQL . . . . . . . . . . . . . . . . . . . . . 1025.16 Sorting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.17 The Group-by Option . . . . . . . . . . . . . . . . . . . . . . . . 107

5.17.1 The decode and case Functions . . . . . . . . . . . . . . . 111

66 SQL — The Relational Language

5.17.2 The rollup and cube Extensions of group by . . . . . . . . 114

5.18 Analytical Capabilities of SQL Plus . . . . . . . . . . . . . . . . . 124

5.18.1 Ranking Functions . . . . . . . . . . . . . . . . . . . . . . 125

5.18.2 Top-n Queries . . . . . . . . . . . . . . . . . . . . . . . . . 129

5.18.3 Windowing functions in SQL Plus . . . . . . . . . . . . . . 131

5.19 Statistics in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.19.1 Variance and Correlation . . . . . . . . . . . . . . . . . . 132

5.19.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 136

5.20 Graphs and SQL in SQL Plus . . . . . . . . . . . . . . . . . . . . 138

5.21 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

5.22 Access Rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

5.23 Views in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

5.24 Accessing metadata in SQLPlus . . . . . . . . . . . . . . . . . . . 151

5.25 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

5.26 Bibliographical Comments . . . . . . . . . . . . . . . . . . . . . . 155

5.1 Introduction

SQL is an acronym for Structured Query Language and is the name of the mostimportant tool for defining and manipulating relational databases. The develop-ment of SQL began in the mid-1970s at the IBM San Jose Research Laboratory.The success of an experimental IBM database system (known as System R) thatincorporated SQL compelled a number of software manufacturers to join IBMin developing relational database systems that incorporated SQL. In 1982, theAmerican National Standards Institute (ANSI) initiated the development of astandard for a query language for relational database systems, it opted for SQLas its prototype. The resulting ANSI standard, issued in 1986, was adopted asan International Standard by the International Organization for Standardization(ISO) in 1987.

In the late 1980s, embedded SQL was standardized by ANSI, and work onexpanding SQL continues. A much extended version of the original standard,known as SQL92, was adopted by ISO/IEC at the end of 1992. To reflect cur-rent trends in the database field towards object-relational technology, a newstandard ISO/IEC 9075-1, known as SQL99, was published in July 1999. Aswe shall see, SQL99 is a superset of SQL92. New features incorporated by thisstandard include object-relational extensions (user-defined data types, referencetypes, collections, large object support, table hierarchies), active database fea-tures (triggers), stored procedures and functions, on-line analytic processingextensions, etc. More recently, in 2003, a new standard was issued. This newedition of the standard includes a new chapter that deals with the interactionbetween SQL and XML (which we discuss in Chapter 10), correction to SQL99,and several new features.

Our presentation concentrates initially on common SQL features, applicableto a wide range of SQL implementations.

5.2 Tabular Variables in SQL 67

SQL is a nonprocedural language. This means that a query formulated inSQL need not specify how a problem is to be solved nor how data should beaccessed by the computing system; instead, an SQL query states what the query

is, i.e., what data are sought.This leaves the user free to focus on the logic of the query. Because the

DBMS makes use of its internal knowledge, in most cases, the DBMS generatesretrieval procedures that are faster than equivalent retrieval procedures builtdirectly by the user.

The SQL language consists of three components: the data definition lan-guage (DDL), the data manipulation language (DML), and the data control

language (DCL). The first component allows the user to define the structure ofthe tables of the database. The second contains retrieval and update directives.The last component allows the database administrator to define the access rightsto the database for various categories of users.

SQL syntax is format-free: tabs, carriage returns, and spaces can be includedanywhere a space occurs in the definition of an SQL construct. Also, case isinsignificant in table names, reserved words and keywords. However, case issignificant in character string literals.

5.2 Tabular Variables in SQL

When we introduced tables in Chapter 3, we assumed that the contents of a tableis a relation, that is, it is a set of tuples. To conform to the reality of databaseswe need to define the content of a table as a sequence of tuples. Thus, a tablemay contain several copies of the same tuple. If a table is allowed to containduplicates, then even if we know all components of a tuple, we may be unableto identify the corresponding row in the table uniquely. As a consequence, notevery table has a key.

In this section we present a topic that we refer to informally as “table cre-ation”. In reality, we create an object similar to a variable in a programminglanguage that we call a tabular variable. The values of a tabular variables aretables and these values change in time. Tabular variables are created using theconstruction create table.

Example 5.2.1 To create a tabular variable called PATRONS having the head-ing

name addr city zip telno date of birthwe write:

create table PATRONS (name varchar(35) not null,

addr varchar(50),

city varchar(25),

zip char(9),

telno char(12),

date_of_birth date);

As we shall see, each attribute is followed by a description of its domain. The


effect of this command is to create a tabular variable whose initial value is atable whose contents is the empty set of tuples:

PATRONS

name addr city zip telno date of birth

After inserting a first row, the next value of the tabular variable PATRONS

is the table:PATRONS


Ann Richards 56 Green Ln Natick 02170 508-561-0987 02/15/78

A second insertion yields a new table as the value for the tabular variable:PATRONS


Ann Richards 56 Green Ln Natick 01170 508-561-0987 02/15/78Ron Scott 50 Cider Hill Framingham 01160 608-663-0211 11/4/80

If the first patron moves to a new address, the first row is modified and thetabular variable assumes a third value:

PATRONS


Ann Richards 77 Lake St. Milton 02186 617-364-0606 02/15/78Ron Scott 50 Cider Hill Framingham 02160 608-663-0211 11/4/80

The values that the tabular variable PATRONS may assume are the actualtables that have the name and the heading specified at the creation of thetabular variable. In addition, we can specify several types of constraints thatany value of the tabular variable must satisfy.

Before it is possible to create tabular variables and form queries, it is neces-sary to create an empty database in which to work. In practice, this is generallydone at the level of the operating system, usually with a command that is pro-vided by the vendor of the DBMS.

To start, we assume that we have created an empty database. In this sectionwe begin to discuss a part of the data definition component of SQL, namely, thecreation of tabular variables, or informally, the creation of database tables.

5.2.1 Table Creation

We refer to the components of the Data Definition Language (DDL) as directives.The SQL directive for adding tables to a database is create table.

At a minimum, as we saw in Example 5.2.1, creating a tabular variablein SQL requires that we specify its name and its attributes along with theirdomains. The syntax for this is:

create table table name

[(attr def {,attr def })],

where the attribute definition attr def has the syntax:

attribute name domain


A slightly more general form (that ignores certain details related to thephysical design of databases), the directive that creates a tabular variable iscreate table and has the form: following syntax:

create table [schema.]table name

[(〈attr def | table constraint | table ref clause 〉{,〈attr def | table constraint | table ref clause 〉})],

where the attribute definition attr def has the syntax:

attribute name domain [default expr] [column ref clause]{column constraint}

As a result of the execution of this directive, an initial amount of spaceis reserved in secondary memory to accommodate future values of the tabularvariable, and the metadata are modified to reflect the addition of the new tabularvariable. Specialized SQL constructions, discussed later (insert, delete, andupdate) can be used to modify the value of this variable.

Creation of tabular variables permits placing restrictions, called constraints

on the contents of any value that the tabular variable may assume. The con-straints that follow have a global character (which means that they apply tothe contents of a table in its entirety) and apply to any value that the tabularvariable may assume.

Definition 5.2.2 A primary key constraint has the form

[constraint constraint name] primary key(list of attributes)

when the primary key consists of the attributes of the list.

Alternate keys of tables can be specified using unique constraints. The syntaxof this type of constraints is:

[constraint constraint name] unique(list of attributes)

This indicates that no two rows of a table that is a value of the tabular variablemay have the same values for the attributes specified in the list.

A constraint of the form cC that involves conditions C that are a Booleancombination of conditions involving only components of tuples and constants isdenoted by:

[constraint constraint name] check(C)

When a constraint involves more than one attribute it is considered a table

constraint ; otherwise, it is a column constraint. Referential integrity can beimposed by using the column constraint references in the definition of anattribute. To prevent certain components of tuples from assuming a null valuewe can impose the column constraint not null.

Example 5.2.3 To create the tabular variable INSTRUCTORS of the collegedatabase we use the following create table directive:

create table INSTRUCTORS(empno varchar(11) not null,

name varchar(35),

rank varchar(25),

roomno integer,

telno varchar(4), primary key(empno));


The domain of empno is defined to be the set of strings of length at most11. In addition, we have the column constraint not null, which means thatnull cannot be used as a value of the attribute empno. The domains of theother attributes have similar, obvious definitions that are discussed below. Notethat in the definition of INSTRUCTORS we impose a table constraint, namelyprimary key(empno).

Similarly, the tabular variables STUDENTS and COURSES are created by:

create table STUDENTS(stno varchar2(10) not null,

name varchar2(35) not null,

addr varchar2(35),

city varchar2(20),

state varchar2(2),

zip varchar2(10), primary key(stno));

create table COURSES(cno varchar2(5) not null,

cname varchar2(30),

cr smallint, primary key(cno));

A script that creates all tabular variables of the college database is containedin Appendix A.

Example 5.2.4 To express that the primary key of the table GRADES consistsof the attributes stno cno sem year we can say that this table satisfies the primarykey constraint:

constraint pkg primary key (stno, cno, sem, year)

Example 5.2.5 For the table EMPHIST, introduced in Example 3.3.5 we couldintroduce the tuple conditions:

constraint pos_sal check(salary > 0)

and

constraint suf_sal check(position != ’Programmer’ or salary > 65000),

respectively. They express that the salary must be a positive number and thatsomebody who is a programmer must be paid more than 65000 dollars, respec-tively.

Thus, the creation of the table EMPHIST can be achieved by:

create table EMPHIST(empno integer not null references PERSINFO(empno),

position varchar2(30),

dept varchar2(20),

appt_date date,

term_date date,

salary float,

check(position != ’Programmer’ or salary > 65000),

constraint pos_sal check(salary > 0));

A script that creates the tables PERSINFO, EMPHIST, and REPORTING is con-tained in Appendix C.


Example 5.2.6 In the directives enclosed below we state that stno is both aforeign key for ADVISING and, also, its primary key. In addition, empno is aforeign key for this table (being the primary key for the table INSTRUCTORS).

create table ADVISING(stno varchar2(10) not null

references STUDENTS(stno),

empno varchar2(11)

references INSTRUCTORS(empno),

primary key(stno));

create table GRADES(stno varchar2(10)

not null references STUDENTS(stno),

empno varchar2(11)

not null references INSTRUCTORS(empno),

cno varchar2(5)

not null references COURSES(cno),

sem varchar2(6) not null,

year smallint not null,

grade integer,

primary key(stno,cno,sem,year),

check (grade <= 100));

The definition of the tabular variable GRADES specifies referential integrity con-straints for each of the attributes stno, empno,cno. In addition, this designatesthe set of attributes stno,cno,sem,year as the primary key of GRADES and, also,imposes the constraint grade < 100.

To remove the tabular variable T we use the constructdrop table T

Rows can be inserted in a table individually, as we show below, or as theyare produced by a select phrase (as we shall see later). To insert a row in atable T whose heading is A1 · · ·An we write in SQL a directive of the form:

insert into T (A1, . . . , An)values (a1, . . . , an);

For example, to insert the row(’1011’,’Edwards P. David’,’10 Red Rd.’,’Newton’,’MA’,’02159’)

into the table STUDENTS we write:

insert into STUDENTS(stno,name,addr,city,state,zip)

values (’1011’,’Edwards P. David’,’10 Red Rd.’,’Newton’,’MA’,’02159’);

It is possible to insert tuples in the database starting from text files by usinga special utility or ORACLE known as the SQL*Loader. Details are providedin Appendix D.

To delete a row specified by a certain condition we can use the constructdelete. For example, to remove the row of the table STUDENTS that corre-sponds to the student having student number ’1011’ we write:

delete from STUDENTS

where stno = ’1011’;


If you wish to examine the headings of the tables you created you can issue,for example, the SQL Plus directive

describe INSTRUCTOR;

Then, SQL will print:

Name Null? Type

-------------------------- -------- ------------

EMPNO NOT NULL VARCHAR2(11)

NAME VARCHAR2(35)

RANK VARCHAR2(25)

ROOMNO NUMBER(38)

TELNO VARCHAR2(4)

The directive alter table is used for modifying the structure of an existingtable. Columns may be added or dropped, the names of the columns or theirdata types can be modified, etc. A simplified syntax of this directive is:

alter table table name modification specification

In turn, the modification specification depends on the particular change we needto impose on the table. Examples of such modification specifications include

add column name column type,drop column name,modify column name column type,rename column name to new column name,

as well as many other choices.

Example 5.2.7 To add a new year column to the table ADVISING we use thedirective:

alter table advising add year varchar2(4);

The entries of the new column year will have initially null values.Column types can be modified using the modify option. For instance, to

increase the maximum length of the values of stno to 12 characters we write:

alter table advising modify stno varchar(12);

Column renaming is executed using the option rename column. Below werename the column stno to studentno:

alter table advising rename column stno to studentno;

Finally, to drop the column year that we just added we write:

alter table advising drop column year;

5.3 Referential Integrity in SQL

We saw that referential integrity can be imposed in SQL using the columnconstraint references. An alternative method is to impose the table constraintforeign key. Its syntax is:

5.3 Referential Integrity in SQL 73

foreign key(attr def {,attr def })references table name ((attr def {,attr def })[on cascade delete]

The foreign key construction contains the option on cascade delete. Therole of this option is to define the behavior of the tables when deletions occurin the table where the primary key occurs. Namely, when a row is removedfrom the table containing the primary key and the clause on cascade delete isspecified, then all rows from the table that contains the corresponding foreignkey that match the removed row are also removed.

Example 5.3.1 Suppose that the tabular variable CITIES is created by:

create table CITIES (city varchar(40),

state char(2),

primary key (city,state));

A second tabular variable, STORES, records the stores that a retailer has inthe covered territory, and is created by

create table STORES (storeno integer not null,

address varchar(40) not null,

city varchar(40),

state char(2),

tel char(12),

primary key storeno,

foreign key(city,state) references CITIES(city,state)

on delete cascade);

To populate the tables we execute the following directives:

insert into CITIES(city, state) values(’Boston’,’MA’);

insert into CITIES(city, state) values(’Spingfield’,’MA’);

insert into CITIES(city, state) values(’Providence’,’RI’);

insert into CITIES(city, state) values(’Hartford’,’CT’);

insert into CITIES(city, state) values(’Bayonne’,’NJ’);

insert into STORES(storeno, addr, city, state, tel)

values(1,’125 Harvard St.’,’Boston’,’MA’,’617-287-0991’);


values(2,’50 Storrow Drive’,’Boston’,’MA’,’617-566-7629’);


values(3,’85 Manton Av.’,’Providence’,’RI’,’401-453-1234’);


values(4,’40 West Street’,’Hartford’,’CT’,’860-232-4484’);


values(5,’5 Finley Av.’,’Bayonne’,’NJ’,’908-221-0094’);



values(6,’10 Linton Plaza’,’Hartford’,’CT’,’860-660-2220’);


values(7,’30 Stilson Rd.’,’Providence’,’RI’,’401-861-5249’);

The values of the tabular variables CITIES and STORES are

CITY ST

---------------

Boston MA

Spingfield MA

Providence RI

Hartford CT

Bayonne NJ

and

STORENO ADDR CITY ST TEL

------------------------------------------------------

1 125 Harvard St. Boston MA 617-287-0991

2 50 Storrow Drive Boston MA 617-566-7629

3 85 Manton Av. Providence RI 401-453-1234

4 40 West Street Hartford CT 860-232-4484

5 5 Finley Av. Bayonne NJ 908-221-0094

6 10 Linton Plaza Hartford CT 860-660-2220

7 30 Stilson Rd. Providence RI 401-861-5249

Since the referential integrity was imposed between the tabular variablesCITIES and STORES we need to insert the tuples of CITIES before we can insertthe tuples of STORES. Otherwise, the cities mentioned in the values of STORES

can not reference a city in a value of CITIES and the insertion in STORES willbe rejected.

The presence of on delete cascade means that if a row is removed froma table CITIES that the rows corresponding to that city are also removed. Forexample, if the company closes its business in Hartford and we execute

delete from CITIES where

city = ’Hartford’ and state = ’CT’;

then the rows of STORES corresponding to the stores in Hartford, CT will bedeleted automatically.

Removal of the tabular variables is also constrained by the referential in-tegrity. It would be impossible to remove the tabular city CITIES before weremove the table STORES because STORES references CITIES. Thus, the cor-rect order of removal is

drop table STORES;

drop table CITIES;

If the clause on cascade delete is absent, then the deletion of a row fromCITIES is impossible unless we delete first the rows of STORES that correspondto the city that is removed from CITIES.

5.4 Basic Data Types 75

5.4 Basic Data Types

SQL makes use of a collection of domains that, in general, varies from oneimplementation to another. Not all domains of the standard exist in everyimplementation, and not all domains of implementations exist in the standard.

Basic domains supported by virtually all implementations of SQL can beclassified as string domains, numerical domains, and special domains.

5.4.1 String Domains

String domains represent fixed-length or variable-length sets of sequences ofcharacters. In this category, we have char(n), which represents the set of stringsof characters (from a given basic set of characters) that have fixed length n. Sim-ilarly, varchar(n) represents the set of variable-length strings whose maximallength is n for n > 0.

5.4.2 Numeric Domains

The SQL standard prescribes two kinds of numeric domains: exact numeric data

types : numeric, decimal, integer and smallint, and approximate numeric

data types : float, double precision, and real. Their respective syntax is:

numeric [(p[, s])]decimal [(p[, s])]integer

smallint

float [(p)]double precision

real

Here, p stands for precision and s stands for scale (both of which are non-negative integers). The precision parameter refers to the total number of digits,while the scale indicates the number of digits to the right of the decimal point.The difference between numeric and decimal is that in the latter case, p isunderstood to be the maximum number of digits, while in the former case, p isthe exact total number of digits.

The domains smallint and integer have a number of digits dependent onthe implementation; however, the precision of integer is required to be equalto or larger than the precision of smallint.

The float domain includes approximate representations of real numbers hav-ing precision at least p. Also, real and double precision have implementation-dependent precision, where the precision of double precision is never smallerthan the one of real.

5.4.3 Special Domains

Specific DBMSs have their own domains. For instance, ORACLE has the long

domain that contains strings of characters of variable length that may be as


large as 65,535 characters.To allow us to begin working with actual examples as quickly as possible, we

introduce some basic domains for ORACLE. Other databases are quite similar,and the reader can obtain the relevant details by consulting product-specificmanuals.

5.4.4 Basic Domains Supported by ORACLE

We review briefly a few of the more important domains supported by ORACLE:• In ORACLE, char[(n)] represents variable strings of characters of length

n, where 1 ≤ n ≤ 32767; the default value of n is 1. The domain charac-

ter is the same as char. The characters and their order are determinedby the system during the installation of the DBMS.The domain varchar(n) requires n to be specified and also representsvariable-length strings of characters. It is the intention of ORACLE toseparate char(n) from varchar(n) in future releases: char(n) will repre-sent fixed-length strings while varchar(n) will represent variable-lengthstrings.The varchar2 data type stores variable-length character strings and iscurrently synonymous with the varchar data type. However, in a futureversion of Oracle, varchar might store variable-length character stringscompared with different comparison semantics. Currently there are twotypes of comparison semantics for strings in Oracle: blank-padded com-parison semantics and non-padded comparison semantics.When blank-padded comparison semantics is used, if the two values havedifferent lengths, Oracle first adds blanks to the end of the shorter oneso their lengths are equal. Oracle then compares the values characterby character up to the first character that differs. The value with thegreater character in the first differing position is considered greater. Iftwo values have no differing characters, then they are considered equal.This rule means that two values are equal if they differ only in the numberof trailing blanks. Oracle uses blank-padded comparison semantics onlywhen both values in the comparison are either expressions of data typechar, text literals, or values returned by the user-defined function.In the case of non-padded comparison semantics two values are comparedcharacter by character up to the first character that differs. The value withthe greater character in that position is considered greater. If two valuesof different length are identical up to the end of the shorter one, the longervalue is considered greater. If two values of equal length have no differingcharacters, then the values are considered equal. Oracle uses non-paddedcomparison semantics when one or both values in the comparison have thedata type varchar or varchar2.In either of the two comparison semantics we have ’ab’ > ’aa’ and’ab’ > ’a ’. However, in the blank-padded comparison semantics wehave ’a ’ = ’a’, while in the non-padded semantics we have ’a ’ > ’a’.

• The domain date represents dates in the format dd-mmm-yy.

5.5 SELECT Phrases 77

• The domain long (also denoted by long varchar) represents variable-length strings of characters with no more than 65,535 characters. At mostone attribute may have this domain in any table.

• The number domain in ORACLE can be used in several forms as specifiedby the following syntax:

number [(p[, s])],

where p is the precision and s is the scale.The maximum precision of number is 38. The scale can vary between

−84 and 127. If the scale is negative, the number is rounded to thespecified number of places to the left of the decimal point.

The following cases may occur when we insert a value in a column whosedomain is number:

Data Domain Stored as

1,234,567.89 number 1234567.891,234,567.89 number(9) 12345671,234,567.89 number(9,2) 1234567.891,234,567.89 number(9,1) 1234567.91,234,567.8 number(6) error: exceeds precision

1,234,567.89 number(10,1) 1234567.91,234,567.89 number(7,-2) 12345001,234,567.89 number(7,2) error: exceeds precision

If s > p, then s specifies the maximum number of valid digits after thedecimal point. For instance, number(4,5) requires at least one digit afterthe decimal point and rounds the digits after the fifth decimal digit. Thenumber 0.012358 is stored as 0.01236.

Numbers may also be entered in exponential form, that is, includingan exponent preceded by E. For example, 1234567 can be represented as1.234567E+6, that is, as 1.234567× 106.

• Floating point domains are supported as float, float(*), and float(b),where b is the binary precision, that is, the number of significant binarydigits. The domains float and float(*) are equivalent, and they consistsof floating point numbers that can be represented by 126 binary digits (or,equivalently, by about 36 decimal digits).

• To provide compatibility with other systems, ORACLE supports suchdomains as decimal, integer, smallint, real, and double precision.However, their internal representation is defined by the format of thenumber domain.

5.5 SELECT Phrases

Queries must be written based on the names and headings of the tabular vari-ables and not on the tables that represent their values at any given moment.This is similar to writing programs. A program should work for all legal inputsand not just the ones on which it was tested. In both cases, it is important to


focus on the abstract structure and not on specific examples. The way we writeSQL constructs must be directed only by the logic of the query and not by the

content of a particular database instance. Just because the query generated theright answer for a particular instance of the database does not mean that it iscorrect.

The main retrieval construction is the select phrase. Consider a query thatwe solved previously using relational algebra. Recall that in Example 4.1.25 wefound the names of all instructors who have taught any student who lives inBrookline. The solution involved using product, selection, and projection:

T1 := (STUDENTS × GRADES × INSTRUCTORS)

T2 := T1 where STUDENTS.stno = GRADES.stnoand

GRADES.empno = INSTRUCTORS.empnoand

STUDENTS.city = ’Brookline’

ANS := T2[INSTRUCTORS.name].In SQL the same problem can be resolved using a single select phrase as in:

select INSTRUCTORS.name from STUDENTS, GRADES, INSTRUCTORS

where STUDENTS.stno = GRADES.stno and

GRADES.empno = INSTRUCTORS.empno and

STUDENTS.city = ’Brookline’;

We can conceptualize the execution of this typical select using the opera-tions of relational algebra as follows:

1. The execution begins by performing the product of the tables listed afterthe reserved word from. In our case, this involves computing the product

STUDENTS × GRADES × INSTRUCTORS

2. The selection specified after the reserved word where is executed next,if the where part is present (we shall see that this may or may not bepresent in a select.) In our case, this amounts to retaining that part ofthe table product that satisfies the condition:

STUDENTS.stno = GRADES.stno and GRADES.empno = INSTRUCTORS.empno

and STUDENTS.city = ’Brookline’

3. Finally, the result of the second phase is projected on the attributeslisted between select and from, that is, in our case, on the attributeINSTRUCTORS.name.

We use a string constant (also known as a literal) in the above select, namely’Brookline’. String constant must begin and end with a single quote.

SQL is not case-sensitive. This means that you may or may not use capitalletters in any place in an SQL construction (except for string comparisons)without any effect on the value returned by the query.

As we mentioned above, the where part of a select (also known as the where

clause) is optional. This allows us to compute table projections in SQL as weshow next.

5.5 SELECT Phrases 79

Example 5.5.1 In Example 4.1.16, we obtain a list of instructors’ names andthe room numbers of their offices by projecting the table INSTRUCTORS onname roomno.

In SQL this can be done by writing

select name, roomno from INSTRUCTORS;

The select construct used above requires the table name for the table in-volved in the retrieval and the list of attributes that we need to extract.

In general, if we need to compute the projection of a table T on a set ofattributes A1 . . . An of the heading of T , we use the construct:

select A1, . . . , An from T ;

Example 5.5.2 To find out the states where the students originate we projectthe table STUDENTS on the attribute state. This is done by

select state from STUDENTS;

The system returns the result:

ST

--

MA

MA

MA

MA

NH

MA

MA

MA

RI

The value ’MA’ is repeated 7 times because there are seven students who livein Massachusetts.

Duplicate values can be eliminated from a query by using the option distinctas in

select distinct state from STUDENTS;

This will yield the answer:

ST

--

MA

NH

RI

where duplicate values have been dropped.


5.6 The WHERE Option

The where clause allows us to extract tuples that satisfy certain conditions; inother words, using the where clause we can perform selections.

Example 5.6.1 To find students who live in Boston we write:

select stno, name, addr, city, state, zip

from STUDENTS

where city = ’Boston’;

This select will return the result:

STNO NAME ADDR CITY ST ZIP

---------------------------------------------------------------

2890 McLane Sandy 30 Cass Rd. Boston MA 02122

4022 Prior Lorraine 8 Beacon St. Boston MA 02125

5544 Rawlings Jerry 15 Pleasant Dr. Boston MA 02115

If we want to extract all columns of a table instance, we can use the “wild-card” character, *, instead of listing all columns. Thus, we can write the equiv-alent select:

select * from STUDENTS

where city = ’Boston’;

Here the symbol * replaces the full attribute list.

Starting from simple conditions (which we called atomic conditions in Chap-ter 4) we can write queries involving more complicated conditions built by usingand, or, and not.Example 5.6.2 In Example 4.1.14 we retrieved the students who live in Bostonor Brookline. In SQL this can be done by:

select * from STUDENTS

where city = ’Boston’ or city = ’Brookline’;

This yields the result:

STNO NAME ADDR CITY ST ZIP

---------------------------------------------------------------

2661 Mixon Leatha 100 School St. Brookline MA 02146


3566 Pierce Richard 70 Park St. Brookline MA 02146



Example 5.6.3 To retrieve the grade records obtained in cs110 during theSpring of 2000 we can write in SQL:

select * from GRADES

where cno = ’cs110’ and sem = ’SPRING’

and year = 2003;

This returns the result:

5.6 The WHERE Option 81

STNO EMPNO CNO SEM YEAR GRADE

---------- ----------- ----- ------ ---------- ----------

1011 023 cs110 SPRING 2000 75

4022 023 cs110 SPRING 2000 60

Selections can be combined with projections in a single SQL phrase.Example 5.6.4 In the select phrase:

select stno, empno from GRADES

where cno = ’cs110’;

the projection specified by where cno = ’cs110’ is followed by the projectionon the attributes stno, empno that are listed after the word select. The resultis:

STNO EMPNO

---------- -----

1011 019

2661 019

3566 019

5544 019

1011 023

4022 023

In SQL we can use conditions that implement limited pattern matching.Certain patterns can be specified using the symbol % to replace 0 or more char-acters, and the underscore to replace exactly one character. As mentioned ear-lier, SQL is generally not case-sensitive; however, comparisons involving stringsare case-sensitive. Thus, “Jerry” and “JERRY” are distinct strings, and ’Jerry’¿ ’JERRY’. The comparison is realized using the operator like.

Example 5.6.5 If we need to find the names and the addresses of studentswhose name includes “Jerry”, we can use the following select construct:

select name, addr from STUDENTS

where name like ’%Jerry%’;

This returns the table:

NAME ADDR

--------------- ---------------

Rawlings Jerry 15 Pleasant Dr.

Lewis Jerry 1 Main Rd.

Example 5.6.6 Suppose the computer science course numbers were carefullyassigned so that all fundamental programming courses have a “1” as their seconddigit. Then the following select construct lists all fundamental programmingcourses.


select * from COURSES

where cno like ’cs_1%’;

The corresponding result is:

CNO CNAME CR CAP

----- ------------------------- -- ---

cs110 Introduction to Computing 4 120

cs210 Computer Programming 4 100

cs310 Data Structures 3 60

cs410 Software Engineering 3 40

Using the reserved word between, we can ensure that certain values arelimited to prescribed intervals (including the endpoints of these intervals).

Example 5.6.7 To find the students who obtained some grade between 65 and85 in 2002, we apply the following query:

select distinct stno from GRADES

where year = 2003 and

grade between 65 and 85;

This select construct returns the table:

STNO

----

1011

2661

5571

The previous select is simply a shorthand for

select distinct stno from GRADES

where year = 2003 and

grade >= 65 and

grade <= 85

Example 5.6.8 A select construct, similar to the one used in Example 5.6.7,can be used to retrieve the students who have some grade that does not satisfythe previous condition, that is, the students who have some grade not between65 and 85:

select distinct stno from GRADES where year = 2003

and grade not between 65 and 85;

This construct generates the answer:

STNO

----

1011

2415

3442

5.6 The WHERE Option 83

3566

4022

5571

We can test if certain components of tuples belong to a certain list of valuesby using a condition of the form:

A in (v1, . . . , vn)

This condition is satisfied by those tuples t such that t[A] has one of the valuesv1, . . . , vn.

Example 5.6.9 Let us find the names of students who live in Boston or Brook-line, a query that we already discussed in Example 5.6.2. Using the previouscondition we write:

select name from STUDENTS

where city in (’Boston’,’Brookline’);

Then, the desired list is:

NAME

--------------

Mixon Leatha

McLane Sandy

Pierce Richard

Prior Lorraine

Rawlings Jerry

On the other hand, we can test of the negation of a condition using not. Tolist the names of students who live outside those two cities, we write:


where not(city in (’Boston’,’Brookline’));

which has the same effect as:


where city not in (’Boston’,’Brookline’);

We can insert strings of characters in the list of fields of a select phrase toimprove the presentation of the results.Example 5.6.10 To insert the string ’Student name: ’ in front of a studentname we write:

select ’Student name: ’, name from STUDENTS;


’STUDENTNAME: ’ NAME

--------------- -----------------

Student name: Edwards P. David

Student name: Grogan A. Mary

Student name: Mixon Leatha


Student name: McLane Sandy

Student name: Novak Roland

Student name: Pierce Richard

Student name: Prior Lorraine

Student name: Rawlings Jerry

Student name: Lewis Jerry

In SQL Plus concatenation of strings can be achieved with the concatenationoperator ||.Example 5.6.11 In the next select phrase we concatenate the string ’Student’ with a student’s name, then with the string ’ lives in ’ and the student’s state:

select ’Student ’ || name || ’ lives in ’ || state

from STUDENTS;

returns the result:

’STUDENT’||NAME||’LIVESIN’||STATE

-------------------------------------

Student Edwards P. David lives in MA

Student Grogan A. Mary lives in MA

Student Mixon Leatha lives in MA

Student McLane Sandy lives in MA

Student Novak Roland lives in NH

Student Pierce Richard lives in MA

Student Prior Lorraine lives in MA

Student Rawlings Jerry lives in MA

Student Lewis Jerry lives in RI

In Microsoft SQL server concatenation is obtained using the “+” operator.Example 5.6.12 The query shown in Example 5.6.11 can be executed in Mi-crosoft SQL server by

select ’Student ’ + name + ’ lives in ’ + state

from STUDENTS;

5.7 Union, Intersection, and Difference in SQL

Recall that union, intersection, and difference as defined in relational algebramay occur only between tables that have identical headings. To execute theseoperations in SQL, we need to use compound select phrases. Compound selectsare constructed from simple select phrases using the reserved words union,intersect, and minus. As we shall see, SQL treats union, intersection and dif-ference as operations between sets of tuples, and therefore, it removes duplicatevalues from the results of the queries.

5.7 Union, Intersection, and Difference in SQL 85

Example 5.7.1 To determine the student numbers of students who took cs210we write:

select stno from GRADES


This returns the result:

STNO

----

1011

2661

3566

5571

4022

Similarly, we find the student numbers of students who took cs240:



In turn, this yields:

STNO

----

3566

5571

2415

5544

1011

4022

To find the students who took both cs210 and cs240 we use the intersectto link the two previous select phrases into a compound select:

select stno from grades where cno = ’cs210’

intersect

select stno from grades where cno = ’cs240’;

This gives:

STNO

----

1011

3566

4022

5571

Neither SQL Server nor MySQL aupport the intersect operation.The union of the two sets is computed by the following compound select:


union


Note that the tuples of the result are sorted.


STNO

----

1011

2415

2661

3566

4022

5544

5571

If we wish to retain all values in the result, then we need to use union allto link the select phrases as in:


union all


The result contain now all values retrieved by the individual selects:

STNO

----

1011

2661

3566

5571

4022

3566

5571

2415

5544

1011

4022

The set difference is computed in ORACLE’s SQLPlus using minus. To findthe students who took cs210 but did not take cs240 we write:


minus


which returns the result:

STNO

----

2661

The reverse difference allows us to find students who took cs240 but did nottake cs210:


minus


Now we obtain:

5.8 Table Product in SQL 87

STNO

----

2415

5544

Neither SQL Server nor MySQL support the minus operation.

5.8 Table Product in SQL

A select phrase that lists several distinct table names after the reserved wordfrom computes the product of these tables.Example 5.8.1 To examine all possible pairs of students/instructors we couldwrite the following select:

select STUDENTS.name, INSTRUCTORS.name

from STUDENTS, INSTRUCTORS;

Since our database is in a state that contains 9 students and five instructors,this will result in 45 rows retrieved:

NAME NAME

---------------------------------

Edwards P. David Evans Robert

Grogan A. Mary Evans Robert

Mixon Leatha Evans Robert

.

.

.

Pierce Richard Will Samuel

Prior Lorraine Will Samuel

Rawlings Jerry Will Samuel

Lewis Jerry Will Samuel

Observe that the tables are not “linked” by any where condition; as expectedin the definition of the product, all combinations of rows are considered. Af-ter computing the product, a projection eliminates all attributes except STU-

DENTS.name and INSTRUCTORS.name.Also, note that we use qualified attributes as required by the definition of

table product (see Definition 4.1.7).The result produced by the query shown in Example 5.8.1 does not differ-

entiate between the attributes STUDENTS.name and INSTRUCTORS.name andthis may confuse the user. Therefore, it is preferable to rename the columns ofthe result using the option as:

select STUDENTS.name as stname, INSTRUCTORS.name as instname


This will generate:


STNAME INSTNAME

---------------------------------



Mixon Leatha Evans Robert

.

.

.

Pierce Richard Will Samuel


Rawlings Jerry Will Samuel


SQL allows for computations of products of several copies of the same tablethrough the creation of aliases; the solution proceeds using the logic discussedin Example 4.1.18. To create an alias S of a table named T we write the nameof the alias after the name of the table in the list of table, making sure that atleast one space (and no comma) exists between the name of the table and itsalias. For example, in the select phrase of Example 5.8.2 we create the alias Iby writing

INSTRUCTORS I

Table aliases are also known as correlation names of tables.

Example 5.8.2 Let us solve the query shown in Example 4.1.18: finding allpairs of instructors’ names for instructors who share the same office. This canbe done by writing:

select I.name as firstname, INSTRUCTORS.name as secname

from INSTRUCTORS I, INSTRUCTORS

where I.roomno = INSTRUCTORS.roomno and

I.empno < INSTRUCTORS.empno;

The result of this query is:

FIRSTNAME SECNAME

------------------------------

Exxon George Will Samuel

Conceptually, we create an alias I of the table INSTRUCTORS, compute theproduct between this alias and INSTRUCTORS and retain those pairs that sharethe the same room and consist of distinct individuals.

Example 5.8.3 Suppose that we need to find all triples of student names forstudents who live in the same city and state. Now we need to operate with threedistinct copies of the table STUDENTS. This is accomplished by:

select S1.name as name1, S2.name as name2,

S3.name as name3

from STUDENTS S1, STUDENTS S2,

STUDENTS S3

where S1.state = S2.state and

S2.state = S3.state and

5.9 Join in SQL 89

S1.city = S2.city and

S2.city = S3.city and

S1.stno < S2.stno and

S2.stno < S3.stno

which gives the result:

NAME1 NAME2 NAME3

----------------------------------------------------

McLane Sandy Prior Lorraine Rawlings Jerry

5.9 Join in SQL

Earlier version of SQL (at the level of SQL 1) dealt with the join operationindirectly, using operations like product, selection and projection, which arealready available in SQL. The blueprint of this treatment of the join operationwas outlined in Section 4.2.

Example 5.9.1 The SQL solution to the query considered in Example 4.2.2 inwhich we seek to find the names of instructors who have taught any four-creditcourse is solved in SQL by writing:

select distinct INSTRUCTORS.name

from COURSES, GRADES, INSTRUCTORS

where COURSES.cr = 4

and COURSES.cno = GRADES.cno

and GRADES.empno = INSTRUCTORS.empno;

The steps that we applied in relational algebra can be easily reconstituted inSQL. The first step that consists of computing the product

T1 = COURSES × GRADES × INSTRUCTORS

corresponds to the list of tables that follows the word from. Then, the selectionspecified by

T2 = (T1 whereCOURSES.cr = 4 and

COURSES.cno = GRADES.cnoand

GRADES.empno = INSTRUCTORS.empno)

is executed using the condition of the where clause.Finally, the projection

T3(name) = T2[INSTRUCTORS.name]

corresponds to the list that follows select. In this case, this list consists of oneattribute, INSTRUCTORS.name.

We give one more example that shows a typical query that uses a join.


Example 5.9.2 To list all pairs of student names and course names such thatthe student takes the course, the relational algebra solution would require thatwe join the tables STUDENTS, GRADES, and COURSES. In SQL we write:

select distinct STUDENTS.name, COURSES.cname

from STUDENTS, GRADES, COURSES


GRADES.cno = COURSES.cno

This query will return:

NAME CNAME

--------------------------------------------------

Edwards P. David Computer Architecture

Edwards P. David Computer Programming

Edwards P. David Introduction to Computing

Grogan A. Mary Computer Architecture

.

.

.

Prior Lorraine Data Structures

Prior Lorraine Introduction to Computing

Rawlings Jerry Computer Architecture

Rawlings Jerry Introduction to Computing

SQL dialects that conform to the SQL-2 standard (e.g., SQLPlus of Oracle9i and 10g, and Microsoft SQL Server) allow the use of the constructions in-ner join and on. For example, the query discussed in Example 5.9.1 has thealternate solution:


from INSTRUCTORS, COURSES INNER JOIN GRADES

on COURSES.cno = GRADES.cno

where INSTRUCTORS.empno = GRADES.empno

and COURSES.cr = 4;

This query should be viewed as computing the natural join of COURSES andGRADES based on the equality of the attributes they share (as specified by theon clause. Then, the join INSTRUCTORS with the result of the previous join iscomputed using the “simulation by product and selection” method.

In SQL Plus queries involving natural joins among tables who attributesidentically named can be further simplified by applying the using clause, whichlists the attributes involved in the joining.Example 5.9.3 To retrieve the names of instructors who taught cs110 we canexecute in SQL Plus the query:


from INSTRUCTORS inner join GRADES

using(empno);

The inner join can be used for joins that involve more than two tables.

5.9 Join in SQL 91

Example 5.9.4 An alternative solution to the query of Example 5.9.1 thatmakes use of the inner join operation is:


from

INSTRUCTORS inner join GRADES

using(empno)

inner join COURSES

using(cno)

where COURSES.cr = 4

It is possible to involve several attributes in an inner join either explicitely,using the claues on or implicitely, employing the clause using.

Example 5.9.5 To find the pairs of names of students and instructors such thatthe student takes a course with the instructor who is also his or her advisor, wecan write either:

select distinct STUDENTS.name as sname, INSTRUCTORS.name as iname

from GRADES inner join ADVISING

on GRADES.stno = ADVISING.stno and

GRADES.empno = ADVISING.empno

inner join STUDENTS

on ADVISING.stno = STUDENTS.stno

inner join INSTRUCTORS

on ADVISING.empno = INSTRUCTORS.empno

or, equivalently,

select distinct STUDENTS.name as sname, INSTRUCTORS.name as iname

from GRADES inner join ADVISING

using(stno,empno)

inner join STUDENTS

using(stno)

inner join INSTRUCTORS

using(empno)

Cartesian product of two tables can be computed, alternatively using thecross join operation.Example 5.9.6 The query that we wrote in Example 5.8.1 that generates allpossible pairs of students/instructors can be also written as:


from STUDENTS cross join INSTRUCTORS;

which is equivalent to




We saw that when joining two tables not all tuples are joinable; tuples thatbelong to one table and are not joinable with any tuple of the other table leave notrace in the join, a situation that is often inconvenient. As we saw in Section 4.3,the outer join operation and its variants, the left outer join and the right outer

join can rectify this situation.Let us assume that the tabular variables STUDENTS and INSTRUCTORS

contain the tuples shown in Figure 5.1.The tabular variable ADVISING has the same content as the one shown in

Figure 3.1.

Example 5.9.7 Oracle’s own syntax for left outer join is to designate the com-ponent that may be null by ’(+)’, as in

select students.name, ADVISING.empno from STUDENTS, ADVISING

where STUDENTS.stno = ADVISING.stno(+)

This is equivalent to using the operator left outer join as specified by SQL2:

select STUDENTS.name, ADVISING.empno

from STUDENTS left outer join ADVISING

on STUDENTS.stno = ADVISING.stno

\end{PGMdiplsy}

Either phrase will return:

\begin{PGMdisplay}

name empno

-----------------------------------------

Edwards P. David 019

Grogan A. Mary 019

Mixon Leatha 023

McLane Sandy 023

Novak Roland 056

Pierce Richard 126

Prior Lorraine 234

Rawlings Jerry 023

Lewis Jerry 234

Davis Richard

Chu Martin

The computation of the right outer join is similar. We can use either Oracle’ssyntax as in

select ADVISING.stno, INSTRUCTORS.name from ADVISING, INSTRUCTORS

where ADVISING.empno(+) = INSTRUCTORS.empno;

or the standard syntax:

select ADVISING.stno, INSTRUCTORS.name

from ADVISING right outer join INSTRUCTORS

on ADVISING.empno = INSTRUCTORS.empno;

In either case we shall obtain:

5.9 Join in SQL 93

STUDENTS

stno name addr city state zip

1011 Edwards P. David 10 Red Rd. Newton MA 02159

2415 Grogan A. Mary 8 Walnut St. Malden MA 02148

2661 Mixon Leatha 100 School St. Brookline MA 02146


3442 Novak Roland 42 Beacon St. Nashua NH 03060

3566 Pierce Richard 70 Park St. Brookline MA 02146



5571 Lewis Jerry 1 Main Rd Providence RI 02904

6410 Davis Richard 45 Algonquin Rd. Natick MA 01760

7209 Chu Martin 90 Rye Dr. Ayer MA 01290

INSTRUCTORS

empno name rank roomno telno

019 Evans Robert Professor 82 7122

023 Exxon George Professor 90 9101

056 Sawyer Kathy Assoc. Prof. 91 5110

126 Davis William Assoc. Prof. 72 5411

234 Will Samuel Assist.Prof. 90 7024

323 Campbell Kenneth Professor 102 7077

Figure 5.1: Tables with tuples with null components


stno name

---------------------------

1011 Evans Robert

2415 Evans Robert

2661 Exxon George

2890 Exxon George

5544 Exxon George

3442 Sawyer Kathy

3566 Davis William

4022 Will Samuel

5571 Will Samuel

Campbell Kenneth

Finally, the outer join itself can be computed using the operator outer join:


from students full outer join advising

using(stno)

full outer join instructors

using(empno);

This will result in

sname iname

-----------------------------------------------------



Rawlings Jerry Exxon George

McLane Sandy Exxon George

Mixon Leatha Exxon George

Novak Roland Sawyer Kathy

Pierce Richard Davis William



Chu Martin

Davis Richard

Campbell Kenneth

5.10 Sets and subqueries

Subqueries are select phrases that return sets rather than tables. Their mainuse is in conditions that involve sets. As we shall see, they are useful in imple-menting difference and division

in SQL. Syntactically, a subquery is written by placing a select phrasebetween a pair of parentheses. For example,

(select empno from INSTRUCTORS where rank = ’Professor’);

5.10 Sets and subqueries 95

is a subquery that computes the employee numbers of full professors. To findthe student numbers of students who take a course with a full professor, weneed to select those GRADES tuples whose empno belongs to this set. This canbe accomplished by writing:

select distinct stno from GRADES where

empno in (select empno from INSTRUCTORS

where rank = ’Professor’);

This will return the result:

STNO

----

1011

2415

2661

3566

4022

5544

5571

We refer to the first select as the calling select, or the main select or the outer

select; the select of the subquery is the inner select.

As we saw in the introductory example, membership can be tested using in.Here is another example.Example 5.10.1 Let us find the names of students who took cs310. We de-termine the student numbers of those students using a subquery. Then, in themain select, we retrieve those students whose student number is in this set.This can be accomplished using the query:

select name from STUDENTS where

stno in (select stno from GRADES

where cno = ’cs310’);

which returns the table:

NAME

--------------

Mixon Leatha

Prior Lorraine

It is possible to test membership of a tuple in a set of tuples computed by asubquery using a condition of the form

(x1, . . . , xn) in (select A1, . . . , An from · · · )

This type of test is included by SQL99, but it is not implemented in many SQLdialects. However, it is in ORACLE and DB2.

Example 5.10.2 To find the pairs of names of students and instructors suchthat the student took some course with the instructor but no four-credit course.This is computed by the following query:


select STUDENTS.name as sname,

INSTRUCTORS.name as iname

from STUDENTS, INSTRUCTORS where

(STUDENTS.stno, INSTRUCTORS.empno) in

(select stno, empno from grades

minus

select stno, empno from grades

where cno in (select cno

from courses

where cr=4));

This will return the following table:

SNAME INAME

------------------ -------------

Edwards P. David Sawyer Kathy


Mixon Leatha Will Samuel

Novak Roland Will Samuel

Prior Lorraine Sawyer Kathy


Rawlings Jerry Sawyer Kathy


If oper is one of the operators =, !=, <, >, <= or >=, then we can useconditions of the form

v oper any (select ...)

or

v oper all (select ...)

in comparisons that involve some elements of the set computed by the subquery(select · · · ) or all elements of the same set, respectively. Here “!=” stands forinequality.

Example 5.10.3 To find the names of the courses taken by the student whosestudent number is ’1011’, we can use the following query:

select cname from COURSES where

cno = any (select cno from GRADES where stno= ’1011’);

The construct = any is synonymous with in, and the same query could bewritten as:

select cname from COURSES

where cno in (select cno from GRADES where stno= ’1011’);

Also, instead of = any we could use = some, and so, we have a third way orwriting the same query:

select cname from COURSES where

cno = some (select cno from GRADES where stno= ’1011’);

5.11 Parametrized subqueries 97

All three queries result in the table:

CNAME

-------------------------

Introduction to Computing

Computer Programming

Computer Architecture

Example 5.10.4 Let us find the students who obtained the highest grade incs110. Although there are methods that we explain later that yield much simplersolutions for this type of query, for the moment we want to illustrate the oper allcondition. We operate on two copies of GRADES. The copy used in the innerselect is intended for computing the grades obtained in cs110:

select stno from GRADES where cno = ’cs110’

and grade >= all(select grade from GRADES

where cno = ’cs110’);

We obtain the table:

STNO

----

5544

Example 5.10.5 Let us find the students who obtained a grade higher than anygrade given by a certain instructor, say Prof. Will. Using the all... subquerywe can write:


where grade >= all(select grade from GRADES

where empno in (select empno from INSTRUCTORS

where name like ’Will%’));

If we alter this query and replace the instructor with Prof. Davis, who teachesno courses, then the set computed by the query


where grade >= all(select grade from GRADES

where empno in (select empno from INSTRUCTORS

where name like ’Davis%’));

is empty. Therefore, every grade satisfies the inequality, and we obtain allstudent numers for students who took any course!

5.11 Parametrized subqueries

Often the retrieval performed in a subquery depends on a value provided by thecalling select. A typical situation is described in the following example.


Example 5.11.1 Suppose that we need to retrieve the course numbers of coursestaken by the student whose student number is STUDENTS.stno. Ignore (for themoment) the origin of this piece of data. Then, the retrieval is done by theselect construct:

select cno from GRADES

where stno = STUDENTS.stno;

Next, we transform this select into a subquery. The student number STU-DENTS.stno is provided by the outer select of the following construct:

select name from STUDENTS where ’cs310’ in

(select cno from GRADES

where stno = STUDENTS.stno);

Observe that this provides an alternate solution to the query discussed in Ex-ample 5.10.1. Namely, we use a subquery to compute the courses taken by eachstudent. Then, we test if ’cs310’ is one of these courses. We use the qualified at-tribute STUDENTS.stno inside the subquery to differentiate between this inputparameter and the attribute stno of the table GRADES.

Sets of tuples produced by subqueries can be tested for emptiness using theexists condition. Namely, the condition

exists (select ∗ from · · · )

is true if the set returned by the subquery is not empty; similarly,

not exists (select ∗ from · · · )

is true if the set returned by the subquery is empty.

Example 5.11.2 Let us give yet another solution to the query we solved inExample 5.10.1. This time, to find the names of students who took cs310 wedetermine the student numbers of those students for whom their set of gradesin cs310 is not empty. This can be done as follows:

select name from STUDENTS where

exists (select * from GRADES where

stno = STUDENTS.stno and

cno = ’cs310’);

As a result, we have the table:

NAME

--------------

Mixon Leatha

Prior Lorraine

Example 5.11.3 To find instructors who never taught cs110, we search forinstructors for whom there is no GRADES record involving ’cs310’ and theseinstructors. This can be done by

5.11 Parametrized subqueries 99

select name from INSTRUCTORS where

not exists(select * from GRADES where

empno = INSTRUCTORS.empno and

cno = ’cs110’);

which results in the table:

NAME

-------------

Sawyer Kathy

Davis William

Will Samuel

If both the main query and the subquery deal with the same table and thesubquery requires input parameters from the outer query, then we use an aliasof the table in the outer query.

Example 5.11.4 Let us find the student numbers of students whose advisoris advising at least one other student. The information is contained in theADVISING table, and the following select construct uses both ADVISING (inthe subquery) and its alias A in the main query:

select distinct stno from ADVISING A

where exists (select * from ADVISING where

empno = A.empno and stno != A.stno);

This query returns the table:

STNO

----

1011

2415

2661

2890

4022

5544

5571

Subqueries can be used in the list that follows from in exactly the samemanner that tables are used. This is shown in the next example:Example 5.11.5 To find the pairs of names of students and instructors suchthat the student took some course with the instructor we could write:

select STUDENTS.name as sname, INSTRUCTORS.name as iname

from STUDENTS, INSTRUCTORS,

(select stno, empno from GRADES) PN

where STUDENTS.stno = PN.stno and

INSTRUCTORS.empno = PN.empno;


The difference of the tables T and S can be computed by looking for eachtuple of T for which there is no matching tuple in S. This can be done by:select * from T where

not exists (select * from S where

A1 = T.A1 and · · · and An = T.An)

Example 5.11.6 Courses offered by the continuing education program but notby the regular program can be found by writing:

select * from CED_COURSES where

not exists (select * from COURSES where

cno = CED_COURSES.cno)

which takes advantage of the fact that cno is a key for both COURSES andCED COURSES.

5.12 Subqueries and division

SQL does not have a division operation. However, as we saw in Examples 4.1.27and 4.2.3, we can perform division using product, projection, and difference. Ofcourse, we could apply the prescription offered by relational algebra. This typeof solution is discussed in the next example.Example 5.12.1 The solution envisioned here is

select cno from grades

minus

select GI.cno from (select grades.cno,

instructors.empno

from grades, instructors

where rank=’Professor’) GI

where (GI.cno,GI.empno) not in (select cno,empno from grades)

Note that the query

select grades.cno, instructors.empno


where rank=’Professor’

computes all pairs of courses and instructor numbers using the product of thetables GRADES and INSTRUCTORS. Then, the query

select GI.cno from (select grades.cno,

instructors.empno


where rank=’Professor’) GI

where (GI.cno,GI.empno) not in (select cno, empno from grades)

extracts the courses that are part of the pairs of the previous table that do notappear in the GRADES table, that is, the courses for which there exists a fullprofessor who did not teach these courses. These are the courses that we needto exclude from the answer. Thus, the query presented at the beginning of thisexample yields the solution of the problem:

5.12 Subqueries and division 101

CNO

-----

cs110

The solution presented in Example 5.12.1 is not applicable in SQL dialectsthat do not have all the facilities of SQL Plus. Therefore, we need to examinean alternate way of solving this problem that is almost universally usable. Tounderstand the technique used we examine the solution of the query formulatedin the next example.

Example 5.12.2 Again, suppose that we need to determine the courses taughtby every full professor. Let us formulate the same query in a way that iseasier to translate in SQL. Namely, we find the courses for which there are nofull professors who have not taught these courses. The reader should realizeimmediately that this is simply a new formulation of the same problem. Weshow the solution in steps, moving gradually from plain English to SQL:

Phase I:

select cno from GRADES G where

not exists (instructors who are full professors and

have not taught the course G.cno)

Phase II:


not exists (select * from INSTRUCTORS

where rank = ’Professor’ and

these instructors have not taught

the course G.cno)

Phase III:


not exists (select * from INSTRUCTORS

where rank = ’Professor and

not exists (select * from GRADES

where empno = INSTRUCTORS.empno

and cno = G.cno));

In Phase I we determine in SQL the course numbers for which no full pro-fessor exists who has not taught these courses.

In Phase II we concentrate on preventing the existence of full professors whoare not teaching these courses. Note that Phase II still contains an untranslatedpart.

Finally, in Phase III, we translate the part “who have not taught thesecourses” using not exists for the second time.

Example 5.12.3 Another query that requires division in relational algebra is:“Find names of instructors who have taught every 100-level course, that is,


every course whose first digit of the course number is 1.” The formulation thatis better suited to SQL implementation is: “Find names of instructors for whomthere is no 100 level course that they have not taught.” This is solved by thefollowing select construct:


not exists (select * from COURSES

where cno like ’cs1__’ and

not exists (select * from GRADES where

empno = INSTRUCTORS.empno

and cno = COURSES.cno));

The answer that results from our usual database instance is:

NAME

------------

Evans Robert

Exxon George

5.13 Relational Completeness of SQL

Between Chapter 4 and the current chapter, we have shown that SQL is capableof performing all operations of relational algebra. This fact is known as therelational completeness of SQL. As we shall see in subsequent chapters, thecapabilities of SQL go well beyond the standard definition of relational algebra.

5.14 Scalar Functions of SQL

We present now capabilities of SQL that go beyond relational algebra. We beginby discussing built-in functions in SQL that may act on individual values (scalarfunction), functions that act on sets of values (aggregate functions), and, also,analytic functions that can be used for various statistical computations. Then,we continue with the group by option of select, and we discuss several on-lineanalytic processing functions of SQL.

Scalar functions are built-in functions of SQL that work on individual values.They are highly dependent on the particular implementation of SQL, and welimit our discussions to functions implemented by ORACLE’s SQL Plus. Thereare several types of scalar functions, depending on the types of their arguments.

5.14.1 Numerical Functions

Among the numerical functions, abs, sin, cos, power, sqrt, etc. have quite obviousdefinitions. For example, sqrt computes the square root of its argument, whilepower(x, y) computes xy.

5.14 Scalar Functions of SQL 103

Example 5.14.1 To illustrate some of the numerical functions we create atable POINTS whose rows represent labelled points in the plane:

create table POINTS(ptid varchar2(10), x integer, y integer,

primary key(ptid));

and populate this table using the commands:

insert into points(ptid, x, y) values (’a’,0,0);

insert into points(ptid, x, y) values (’b’,0,1);

insert into points(ptid, x, y) values (’c’,0,2);

insert into points(ptid, x, y) values (’d’,1,0);

insert into points(ptid, x, y) values (’e’,1,1);

insert into points(ptid, x, y) values (’f’,1,2);

insert into points(ptid, x, y) values (’g’,2,0);

insert into points(ptid, x, y) values (’h’,2,1);

insert into points(ptid, x, y) values (’i’,2,2);

insert into points(ptid, x, y) values (’j’,3,0);

insert into points(ptid, x, y) values (’k’,3,1);

insert into points(ptid, x, y) values (’l’,3,2);

To determine the distances from ’a’ to every other point we write

select p.ptid,

sqrt(power(a.x - p.x,2)+power(a.y - p.y,2))

as dist

from points a, points p

where a.ptid = ’a’

This returns:

PTID DIST

---------- ----------

a 0

b 1

c 2

d 1

e 1.41421356

f 2.23606798

g 2

h 2.23606798

i 2.82842712

j 3

k 3.16227766

l 3.60555128

To compute the distance between ’a’ having the coordinates (xa, ya) and a point’p’ with coordinates (xp, yp), we use the formula d(a, p) =

√

(xa − xp)2 + (ya − yp)2.The formula appears in the target list of the select and is written with the nu-merical functions sqrt and power.

In Oracle we can perform computations unrelated to any table by using afictious tabular variable that is named DUAL.


Example 5.14.2 To compute sin(30◦), sin(45◦) and sin(60◦) in Oracle, wewrite:

select sin(30*3.14159265359/180) as sin30,

sin(45*3.14159265359/180) as sin45,

sin(60*3.14159265359/180) as sin60

from dual;

We need to convert the angles to radians before sin is applied. This will return:

SIN30 SIN45 SIN60

---------- ---------- ----------

.5 .707106781 .866025404

Microsoft SQL server has a simpler way of performing this type of compu-tations in that it does not require the fictitious table.Example 5.14.3 In SQL server we can simply write:

select sin(30*3.14159265359/180) as sin30,

sin(45*3.14159265359/180) as sin45,

sin(60*3.14159265359/180) as sin60;

to obtain the same result as the one obtained in ORACLE.

5.14.2 String Functions

String functions can be used to transform strings, extract parts of strings, trans-form strings, etc.

The functions upper and lower, convert strings to upper and lower charac-ters, respectively.Example 5.14.4 To print names of students in capital characters and coursetitles in small letters we can write:

select distinct upper(STUDENTS.name) as STNAME,

lower(COURSES.cname) as course

from STUDENTS, GRADES, COURSES


GRADES.cno = COURSES.cno;

This generates the following return:

STNAME COURSE

-----------------------------------------------

EDWARDS P. DAVID computer architecture

EDWARDS P. DAVID computer programming

EDWARDS P. DAVID introduction to computing

GROGAN A. MARY computer architecture

.

.

.


PRIOR LORRAINE data structures

PRIOR LORRAINE introduction to computing

RAWLINGS JERRY computer architecture

RAWLINGS JERRY introduction to computing

These functions are particularly useful for performing string comparisons whenignoring case. Thus,

’STE\%’ like upper(’stephany’)

is true.

Example 5.14.5 The string function replace substitutes every occurrence ofits second argument in the value(s) specified by its first argument, by its thirdargument. In the select written below the string ’Computer’ is replaced by thestring ’Comp.’:

select replace(cname,’Computer’,’Comp.’) from COURSES;

This yields the following result:

REPLACE(CNAME,’COMPUTER’,’COMP.’)

----------------------------------

Introduction to Computing

Comp. Programming

Comp. Architecture

Data Structures

Higher Level Languages

Software Engineering

Graphics

Example 5.14.6 The function concat computes the concatenation of two stringsthat form its arguments. Its effect is identical to the concatenation operator ||that we discussed in Example 5.6.11. The phrase below prints the state and zipcode of each students as a single string:

select name, addr, concat(state,zip) as state_zip from STUDENTS;

This returns:

NAME ADDR STATE_ZIP

----------------------------------------------

Edwards P. David 10 Red Rd. MA02159

Grogan A. Mary Walnut St. MA02148

Mixon Leatha 100 School St. MA02146

McLane Sandy 30 Cass Rd. MA02122

Novak Roland 42 Beacon St. NH03060

Pierce Richard 70 Park St. MA02146

Prior Lorraine 8 Beacon St. MA02125

Rawlings Jerry 15 Pleasant Dr. MA02115

Lewis Jerry 1 Main Rd RI02904


Example 5.14.7 To extract substrings of strings is we can use the functionsubstr. To call this function we need to use the following syntax:

substr(string, integer [,integer ])A typical call such as substr(s, n, m) will return a the substring of length m

of the string s that starts with the nth characater of s. If m is omitted, asin substr(s, n), then the function returns all charaters of s starting from thenth character to the end of s. If n is negative, then the characters are countedbackwards from the end of s.

The select phrase

select substr(’Oracle’,2,3) from dual;

will return:

SUB

---

rac

The next select which omits the third argument of substr:

select substr(’Oracle’,2) from dual

yields:

SUBST

-----

racle

which is the string that begins with the second character of ’Oracle’ and endswith the last character of this string.

Since the second argument of the function call in

select substr(’Oracle’,-4,3) from dual

is negative, the starting position of the substring is the 4th character countedfrom the end (that is, the character ’a’) and thus, the query returns:

SUB

---

acl

The functions lpad and rpad can be used to enhance presentation of resultsof queries. The syntax of lpad is:

lpad(s, integer [string])The effect is to padd s to the left with spaces to bring the total length of thestring to the length specified by the second argument of the function. If thethird argument is present, then this string is repeated to the left to fill up thepadded string.

The function rpad has a similar syntax; however, the padding is done at theright of s.

Example 5.14.8 To print a list of all employees and their salaries (using thetabular variables EMPHIST and PERSINFO we can use the query:


select name, lpad(salary,7,’$’) as ann_salary from

persinfo, emphist

where persinfo.empno = emphist.empno


NAME ANN_SAL

----------------------------------- -------

Natalia Martins $150000

Laura Schwartz $120000

John Soriano $120000

Kendall MacRae $100000

Rachel Anderson $$70000

Richard Laughlin $$70000

Danielle Craig $$90000

Abby Walsh $$75000

Bailey Burns $$70000

5.14.3 Date functions

SQL Plus contains a class of functions that apply to the DATE type: extract,months between, etc.Example 5.14.9 The function extract computes a part of a date value. Itsfirst argument gives the desired date part; the second argument is the datevalue. For instance, to obtain the year part of the appt date attribute of thetable EMPHIST we write:

select empno, extract(year from appt_date) as start_y

from emphist;

This returns:

EMPNO START_Y

---------- ----------

1000 1999

1005 1999

1010 2000

1015 1999

1020 1999

1025 2000

1030 2000

1035 2000

1040 2000

Similarly, we can obtain the “month” part of a date by writing

select empno, extract(month from appt_date)

as start_m

from emphist



EMPNO START_M

---------- ----------

1000 10

1005 10

1010 1

1015 10

1020 11

1025 3

1030 1

1035 2

1040 3

Example 5.14.10 To compute the number of months an employee has workedwe can use the function month between. This will compute the number ofmonths between the current date (designated by the system-provided constantSYSDATE) and the date of hire:

select empno, months_between(SYSDATE,appt_date)

as month_served

from emphist

The table returned by this query is:

EMPNO MONTH_SERVED

---------- ------------

1000 35.8877397

1005 35.532901

1010 32.8877397

1015 35.1135461

1020 34.8877397

1025 30.5974171

1030 32.5974171

1035 31.2748365

1040 30.8877397

Arithmetic computations can be performed in the target list of any select.

Example 5.14.11 Suppose that a bonus is to be paid to the employees. Thebonus is computed by paying 10% of the current weekly salary (salary/52)(determined by a null value of the termination date), multiplied by the numberof months employed. This is computed by

select empno, 0.1 * months_between(SYSDATE,appt_date) * salary/52 as bonus

from emphist

where term_date is null;

This query returns:

5.15 Aggregate Functions in SQL 109

EMPNO BONUS

------------------

1000 10430.7253

1005 8262.69438

1010 7652.27254

1015 6804.93348

1020 4733.05642

1025 4155.51299

1030 5688.95627

1035 4550.04006

1040 4194.59488

5.15 Aggregate Functions in SQL

Aggregate functions are those functions that operate on sets of values. Typicalexamples include: sum, avg, max, min, and count.

The first four functions operate on “columns” of tables and ignore null values.The count returns the number of elements of the set that is its argument.

Example 5.15.1 The following select construct determines the largest gradeobtained by the student whose student number is 1011. The function max isapplied to the set of grades of the student whose number is 1011 and returnsthe largest value in this set:

select max(grade) as highgr from GRADES



HIGHGR

------

90

For instance, sum(A) returns the sum of all values of the selected nonnullA-components of the tuples. Similarly, avg(A) returns the average value of thesame sequence. The expressions max(A) and min(A) yield the largest and thesmallest values in the set of A-components of the tuples selected by a query,respectively.

The functions sum and avg apply to attributes whose domains are numerical(such as integer or float); max and min apply to every kind of attribute.

If we wish to discard duplicate values from the sequences of values beforeapplying these functions, we need to use the word distinct. For instance,sum(distinct A) considers only the distinct nonnull values that occur in thesequence of components.

Example 5.15.2 We mentioned that the built-in functions max and min applyto string domains as well as to numerical domains. We use this feature of thesefunctions to determine the first and the last student in alphabetical order:


select min(name) as first, max(name) as last

from STUDENTS;

This query yields the table:

FIRST LAST

---------------- --------------

Edwards P. David Rawlings Jerry

Next, we show a select construct where the same functions are applied toa numerical domain:

select min(grade) as lowgr,

max(grade) as highgr from GRADES


This generates the answer:

LOWGR HIGHGR

-----------------

40 90

The query

select avg(distinct grade) as avggr from GRADES

where stno = ’1011’

returns the table

AVGGR

-----

73.75

If we discard duplicate values as in

select avg(distinct grade) as avggr from GRADES

where stno = ’1011’

then the average grade is lower, indicating a preponderance of the higher gradesfor this student:

AVGGR

-----

68.33

Built-in functions can be used in subqueries. This is illustrated by the nextexample.Example 5.15.3 To retrieve the students who obtained a grade higher thanthe average grade in cs110 we write:

5.15 Aggregate Functions in SQL 111


and grade > all(select avg(grade) from grades

where cno=’cs110’);


STNO

----

2661

3566

5544

The count function can be used in several ways:• count(A) can be used to determine the number of non-null entries under

the attribute A;• count(distinct A) computes the number of distinct non-null values that

occur under A;• count(*) determines how many rows exist in a table.

Note that count(distinct *) cannot be used in SQL.

Example 5.15.4 Here are several examples of the use of the count function.To find how many students took cs110 in the fall semester of 2002, we write:

select count(cno) from GRADES

where cno = ’cs110’ and

sem = ’Fall’ and

year = 2003;

Since no records exist for any grades given during that semester in cs110, weobtain the answer:

COUNT(CNO)

----------

0

Observe that this table has a system-supplied column name COUNT(cno). Thishappens because we did not provide a name using as.

Let us determine how many students have ever registered for any course. Wehave to retrieve this result from GRADES, and we must use distinct to avoidcounting the same student several times (if the student took several courses):

select count(distinct stno) as nost

from GRADES;

This query returns the one-entry table:

NOST

----

8


Finally, let us determine the names of instructors who are teaching morethan one subject. For every instructor, we determine in a subquery the numberof courses taught. Then, we retain those instructors who taught more than onecourse:


1 < any (select count(distinct cno) from GRADES

where empno = INSTRUCTORS.empno);

We obtain the table:

NAME

------------

Evans Robert

Will Samuel

5.16 Sorting Results

Data obtained from a select construct may be sorted on one or several columnsusing the order by clause. This clause also gives the user the possibility ofopting for an ascending or descending sorting order on each of the columns. Bydefault, the ascending order is chosen.

Example 5.16.1 Suppose that we need to sort the GRADES tuples on thestudent number. For each student, we sort the grades in descending order. Thiscan be done with the query:

select * from GRADES

order by stno, grade desc;

This results in the output shown next:


---------- ----------- ----- ------ ---------- ----------

1011 019 cs210 FALL 2003 90

1011 056 cs240 SPRING 2004 90

1011 023 cs110 SPRING 2003 75

1011 019 cs110 FALL 2002 40

2415 019 cs240 SPRING 2003 100

2661 234 cs310 SPRING 2004 100

2661 019 cs110 FALL 2002 80

2661 019 cs210 FALL 2003 70

3442 234 cs410 SPRING 2003 60

3566 019 cs240 SPRING 2003 100

3566 019 cs110 FALL 2002 95

3566 019 cs210 FALL 2003 90

4022 056 cs240 SPRING 2004 80

4022 234 cs310 SPRING 2004 75

5.16 Sorting Results 113

4022 019 cs210 SPRING 2004 70

4022 023 cs110 SPRING 2003 60

5544 019 cs110 FALL 2002 100

5544 056 cs240 SPRING 2004 70

5571 019 cs210 SPRING 2004 85

5571 234 cs410 SPRING 2003 80

5571 019 cs240 SPRING 2003 50

Instead of using the name of the columns one could use their ordinal positionin the select phrase.Example 5.16.2 An equivalent form of the query from Example 5.16.1 is

select stno, empno, cno, sem, year, grade

from GRADES

order by 1, 6 desc;

Ordering of the results can also be achieved by using expressions.Example 5.16.3 To sort the grades based on the second digit of the coursenumber, and, then on the first digit of the course number (which are the fourthand the third characters of course numbers) we write:

select * from grades

order by substr(cno,4,1), substr(cno,3,1)

This will return the following result:


---------- ----------- ----- ------ ---------- ----------

1011 019 cs110 FALL 2002 40

2661 019 cs110 FALL 2002 80

3566 019 cs110 FALL 2002 95

5544 019 cs110 FALL 2002 100

1011 023 cs110 SPRING 2003 75

4022 023 cs110 SPRING 2003 60

1011 019 cs210 FALL 2003 90

3566 019 cs210 FALL 2003 90

4022 019 cs210 SPRING 2004 70

5571 019 cs210 SPRING 2004 85

2661 019 cs210 FALL 2003 70

2661 234 cs310 SPRING 2004 100

4022 234 cs310 SPRING 2004 75

3442 234 cs410 SPRING 2003 60

5571 234 cs410 SPRING 2003 80

3566 019 cs240 SPRING 2003 100

4022 056 cs240 SPRING 2004 80

5571 019 cs240 SPRING 2003 50

5544 056 cs240 SPRING 2004 70

1011 056 cs240 SPRING 2004 90

2415 019 cs240 SPRING 2003 100


stno empno cno sem year grade

1011 019 cs110 FALL 2002 40

2661 019 cs110 FALL 2002 80

3566 019 cs110 FALL 2002 95

5544 019 cs110 FALL 2002 100

1011 023 cs110 SPRING 2003 75

4022 023 cs110 SPRING 2003 60

1011 019 cs210 FALL 2003 90

3566 019 cs210 FALL 2003 90

4022 019 cs210 SPRING 2004 70

5571 019 cs210 SPRING 2004 85

2661 019 cs210 FALL 2003 70

3566 019 cs240 SPRING 2003 100

5571 019 cs240 SPRING 2003 50

1011 056 cs240 SPRING 2004 90

4022 056 cs240 SPRING 2004 80

5544 056 cs240 SPRING 2004 70

2415 019 cs240 SPRING 2003 100

2661 234 cs310 SPRING 2004 100

4022 234 cs310 SPRING 2004 75

3442 234 cs410 SPRING 2003 60

5571 234 cs410 SPRING 2003 80

Figure 5.2: Table Partitioned in Groups Based on cno

5.17 The Group-by Option

The group by clause serves to group together tuples of tables based on thecommon value of an attribute or of a group of attributes. Suppose, for instance,that we wish to partition the table GRADES into groups based on the coursenumber. This can be done by using a construct like

select ... from GRADES group by cno

Conceptually, we operate on the table shown in Figure 5.2. The reader shouldimagine that the table has been divided into five groups, each correspondingto one course. In the previous select, we left open the target list followingselect. Once a table has been partitioned into groups (using group by), theselect construct that we use must return one or more atomic pieces of data forevery group. The term atomic, in this context, refers to simple pieces of data(numbers, strings, etc.). By contrast, a set of values is not an atomic piece ofdata. For instance, the number of students enrolled in each course can be listedby:

select cno, count(stno) as totenr from GRADES

group by cno

This results in the table:

CNO TOTENR

----- ----------

5.17 The Group-by Option 115

stno empno cno sem year grade

1011 019 cs110 FALL 2002 40

2661 019 cs110 FALL 2002 80

3566 019 cs110 FALL 2002 95

5544 019 cs110 FALL 2002 100

1011 023 cs110 SPRING 2003 75

4022 023 cs110 SPRING 2003 60

1011 019 cs210 FALL 2003 90

3566 019 cs210 FALL 2003 90

2661 019 cs210 FALL 2003 70

5571 019 cs210 SPRING 2004 85

4022 019 cs210 SPRING 2004 70

3566 019 cs240 SPRING 2003 100

5571 019 cs240 SPRING 2003 50

2415 019 cs240 SPRING 2003 100

5544 056 cs240 SPRING 2004 70

1011 056 cs240 SPRING 2004 90

4022 056 cs240 SPRING 2004 80

2661 234 cs310 SPRING 2004 100

4022 234 cs310 SPRING 2004 75

3442 234 cs410 SPRING 2003 60

5571 234 cs410 SPRING 2003 80

Figure 5.3: Table Partitioned in Groups Based on cno, sem, year

cs110 6

cs210 5

cs240 6

cs310 2

cs410 2

It would be an error to write a select like:

select cno, stno from GRADES

group by cno

because more than one student is enrolled in a course, and therefore the entriesof the result under the attribute stno would be sets of values rather than simplevalues. SQL enforces the atomicity of the data generated by a select withgroup by by demanding that any component of the target list of such a select

must be either one of the grouping attributes or a built-in function.

Example 5.17.1 Grouping can be done on more than one attribute. Supposethat now we are interested not in the total enrollment but, rather, in the enroll-ment numbers for each offering of the courses, that is, in the numbers duringevery semester of every year. This can be done using the select construction:

select cno, sem, year, count(stno) as enrol

from GRADES

group by cno, year, sem

order by cno, sem, year;

Conceptually, the grouping results in the groups shown in Figure 5.3.Then, the query generates the answer:


CNO SEM YEAR ENROL

----- ------ ---------- ----------

cs110 FALL 2002 4

cs110 SPRING 2003 2

cs210 FALL 2003 3

cs210 SPRING 2004 2

cs240 SPRING 2003 3

cs240 SPRING 2004 3

cs310 SPRING 2004 2

cs410 SPRING 2003 2

Example 5.17.2 The next select construct determines the average grade andthe number of courses taken by every student and sorts the results in ascendingorder on the student number:

select stno, avg(grade) as average,

count(cno) as ncourses

from GRADES

group by stno

order by stno;

We obtain the result:

STNO AVERAGE NCOURSES

---------- ---------- ----------

1011 73.75 4

2415 100 1

2661 83.33 3

3442 60 1

3566 95 3

4022 71.25 4

5544 85 2

5571 71.66 3

Grouping can be applied in combination with selection. In such cases, selec-tion is applied first and the resulting rows are grouped.Example 5.17.3 The select construct that follows determines the averagegrade in cs110 during successive offerings of this course:

select sem, year, avg(grade) from GRADES

where cno = ’cs110’

group by sem, year

order by year, sem

The result of this query is:

SEM YEAR AVG(GRADE)

------ ---------- ----------

FALL 2002 78.75

SPRING 2003 67.5


It is possible to operate a “selection” on groups rather than on rows usingthe clause having. The condition that follows having must be formulated toinclude only data that have an atomic character for every group.

Example 5.17.4 Let us determine the average grade obtained in courses thatare taken by more than two students. After grouping the tuples of GRADES oncno, we retain the groups that include more than two students by applying theclause having count(grade) > 2:

select cno, avg(grade) from GRADES

group by cno

having count(grade) > 2

order by cno;

This query returns the table:

CNO AVG(GRADE)

----- ----------

cs110 75

cs210 81

cs240 81.66

The group by option offers another approach to solving divsion. To dividethe tabular variable T , whose heading is A1 · · ·AmB1 · · ·Bn by the tabularvariable S whose heading is B1 · · ·Bn we compute the number k of distinctrows in S. Then, we seek to retrieve those m-tuples (a1, . . . , am) that occur inT and are associated in that tabular variable with at least k distinct tuples.

Example 5.17.5 Recall that in Example 5.12.3 we solved the query “Findnames of instructors who have taught every 100-level course, that is, everycourse whose first digit of the course number is 1” by implementing division inSQL.

Here we determine the number 100-level courses and, then we seek the em-ployee numbers that are associated with all these courses in the GRADES table:

select name

from INSTRUCTORS,

(select empno from GRADES

where cno like ’cs1__’

group by empno

having count(distinct cno) =

all(select count(distinct cno)

from COURSES

where cno like ’cs1__’)) E

where INSTRUCTORS.empno = E.empno;

As expected, this will return the same result as the query discussed in Exam-ple 5.12.3.


5.17.1 The decode and case Functions

The function decode is typically used with four arguments and has the syntax:decode(value,search value,result,default value)

The value returned by this function is:

decode(x, s, r, d) =

{

r if x = s

d otherwise.

Example 5.17.6 A course is defined as introductory if its first digit is one.Using the decode function we can print a list of students and the courses theytook followed by an indication of their status using the query:

select stno,cno,

decode(substr(cno,3,1),’1’,’Introductory course’,’Advanced course’)

from grades;

Note that the first digit of the course number is the third character of the cnovalue; this digit is extracted by the function substr previously discussed. Thequery yields the following result:

STNO CNO DECODE(SUBSTR(CNO,3

---------- ----- -------------------

1011 cs110 Introductory course


1011 cs210 Advanced course



















The decode function may accept multiple-choice arguments as indecode(value,search value,result, [search value,result,] default value)

This variant of decode is defined by:

decode(x, s1, r1, . . . , sn, rn, d) =

{

ri if x = si for 1 ≤ i ≤ n

d otherwise.


Example 5.17.7 The following variant of the previous query will print ’Firstyear course’, ’Second year course’, etc., depending on the first digit of the coursenumber:

select stno,cno,

decode(substr(cno,3,1),’1’,’First year course’,

’2’,’Second year course’,

’3’,’Third year course’,

’4’,’Fourth year course’,

’Special course’)

from grades;

The result returned by this query is:

STNO CNO DECODE(SUBSTR(CNO,

---------- ----- ------------------

1011 cs110 First year course


1011 cs210 Second year course





2661 cs310 Third year course

3442 cs410 Fourth year course







4022 cs310 Third year course





5571 cs410 Fourth year course

The function case is an ANSI-compliant stronger analogue of decode. Itcan be used in two formats; either as:case value

when search value result

[when search valueresult]else default value

end

or as:case when condition result

[when conditionresult]else default value


end

In the first case the function returns the result that corresponds to the searchvalue that matches the first argument; in the second case, case returns theresult that corresponds to the first condition that is satisfied.

Example 5.17.8 Using case we can give an alternate solution to the querysolved in Example 5.17.7:

select stno,cno,

case substr(cno,3,1)

when ’1’ then ’First year course’

when ’2’ then ’Second year course’

when ’3’ then ’Third year course’

when ’4’ then ’Fourth year course’

else ’Special course’

end

from grades;

Example 5.17.9 Suppose that the minimal passing grade is 60 for the firstand second year courses and 70 for the third and fourth year courses. We wishto print a report that prints ’Passed’ or ’Failed’ depending on the grade andlevel of the course. This can be done with the following query:

select stno,cno, grade,

case when (substr(cno,3,1) in (’1’,’2’) and grade >= 60) or

(substr(cno,3,1) in (’3’,’4’) and grade >= 70)

then ’Passed’

else ’Failed’

end

from grades

The result returned by this query is;

STNO CNO GRADE CASEWH

---------- ----- ---------- ------

1011 cs110 40 Failed

2661 cs110 80 Passed


5544 cs110 100 Passed



3566 cs240 100 Passed


2415 cs240 100 Passed












2661 cs310 100 Passed


5.17.2 The rollup and cube Extensions of group by

For analyzing complex data, we often wish to partition data into blocks and thencalculate subtotals for these blocks. For example, we may wish to analyze salesdata by geographical region, so we want to calculate values for New England, theMidwest, the South, etc. Such analyses are faciliatated by ORACLE’s rollup

extension of group by.

Example 5.17.10 Suppose that we need to print a report summarizing thenumber of grades given in every course by every instructor. We wish to printsubtotals for every course and then a general total for all courses. This can bedone in SQL using three subqueries (each containing a group by clause) asfollows:

select cno,empno,count(grade)

from grades

group by cno,empno

union

select cno,’’,count(grade)

from grades

group by cno

union

select ’’,’’,count(grade)

from grades;

The result of this query is given below:

CNO EMPNO COUNT(GRADE)

----- ----------- ------------

cs110 019 4

cs110 023 2

cs110 6

cs210 019 5

cs210 5

cs240 019 3

cs240 056 3

cs240 6

cs310 234 2

cs310 2

cs410 234 2

cs410 2

21


It is clear that the execution of this query entails three scans of the tableGRADES followed by the computation of the unions. The result is sorted becauseof the use of the union operation.

In SQL Plus we can replace the cumbersome query used in Example 5.17.10by:


from grades

group by rollup(cno,empno);

which produces exactly the same result. Note that after the number of gradesfor the first two groups are reported in the first two detail rows a blank is printedfor the empno of the third row; this is the rollup way of indicating that thisrow contains the subtotal number of grades for the course cs110. A new detailrow follows for cs210 and, since this course is taught only by the employee 019,the next row contains a subtotal for this course, etc. Finally, the last row, withblank for the first two columns is the total number of grades for all courses.

We conclude that the rollup extension of group by generates subtotals inincreasing order of aggregation until all expressions in the group by clause are“rolled up”.

Example 5.17.11 The next example uses three grouping attributes cno, empno,stno:

select cno,empno,stno,count(grade)

from grades

group by rollup(cno,empno,stno)

This generates the following result:

CNO EMPNO STNO COUNT(GRADE)

----- ----------- ---------- ------------

cs110 019 1011 1

cs110 019 2661 1

cs110 019 3566 1

cs110 019 5544 1

cs110 019 4

cs110 023 1011 1

cs110 023 4022 1

cs110 023 2

cs110 6

cs210 019 1011 1

cs210 019 2661 1

cs210 019 3566 1

cs210 019 4022 1

cs210 019 5571 1

cs210 019 5

cs210 5

cs240 019 2415 1

cs240 019 3566 1

cs240 019 5571 1


cs240 019 3

cs240 056 1011 1

cs240 056 4022 1

cs240 056 5544 1

cs240 056 3

cs240 6

cs310 234 2661 1

cs310 234 4022 1

cs310 234 2

cs310 2

cs410 234 3442 1

cs410 234 5571 1

cs410 234 2

cs410 2

21

The order in which attributes are rolled up influences the result of the queryas the next example shows:Example 5.17.12 Suppose that we invert the grouping attributes cno andempno as in

select empno,cno, count(grade)

from grades

group by rollup(empno,cno);

This will result in:

EMPNO CNO COUNT(GRADE)

----------- ----- ------------

019 cs110 4

019 cs210 5

019 cs240 3

019 12

023 cs110 2

023 2

056 cs240 3

056 3

234 cs310 2

234 cs410 2

234 4

21

Note that this time the subtotals are computed for every employee, and then,for all employees.

Partial rollups, that is, rollups that involve only a subset of the groupingattributes, are always possible as shown in the next example.Example 5.17.13 Suppose that we need to count the number of times a stu-dent takes a course and the number of course offerings a student took. This canbe achieved by:


select stno,cno,count(grade) from grades

group by stno,rollup(cno);

which generates the following result:

STNO CNO COUNT(GRADE)

---------- ----- ------------

1011 cs110 2

1011 cs210 1

1011 cs240 1

1011 4

2415 cs240 1

2415 1

2661 cs110 1

2661 cs210 1

2661 cs310 1

2661 3

3442 cs410 1

3442 1

3566 cs110 1

3566 cs210 1

3566 cs240 1

3566 3

4022 cs110 1

4022 cs210 1

4022 cs240 1

4022 cs310 1

4022 4

5544 cs110 1

5544 cs240 1

5544 2

5571 cs210 1

5571 cs240 1

5571 cs410 1

5571 3

This shows that the student whose number is 1011 took four course offeringsand repeated cs110. Note that for a partial rollup no general total is produced.

The rollup extension is especially useful when there exists a natural orderon the attributes of a table, as is in the next example.

Example 5.17.14 Suppose that we have the table SALES that contains recordsof sales in a chain of department stores that is present in several regions of thecountry: the North East (NE), South East (SE), and Midwest (MW).

REGION ST CITY STORENO SALESVOL

---------- -- --------------- ---------- ----------

NE NY New York City 55 1000

NE NY New York City 67 800


NE NY Syracuse 90 600

NE MA Worcester 41 1000

NE MA Boston 83 750

SE FL Miami 62 450

SE FL Miami 74 900

SE GA Atlanta 60 500

SE GA Atlanta 52 1100

SE GA Augusta 95 300

MW OH Athens 48 590

MW KS Topeka 33 860

MW KS Lawrence 72 300

MW KS Lawrence 09 700

MW KS Wichita 38 900

Clearly, we have a geographical hierarchy of the attributes of this table:region, st, city, and storeno. To study the total sales in each state we can usethe following rollup query:

select region, state, sum(salesvol)

from sales

group by rollup(region,state);


REGION ST SUM(SALESVOL)

---------- -- -------------

MW KS 2760

MW OH 590

MW 3350

NE MA 1750

NE NY 2400

NE 4150

SE FL 1350

SE GA 1900

SE 3250

10750

We may want to “drill-down” in the hierarchy of attributes, to analyze thesales in each city. This is accomplished by:

select region, state, city, sum(salesvol)

from sales

group by rollup(region, state, city)


REGION ST CITY SUM(SALESVOL)

---------- -- --------------- -------------

MW KS Lawrence 1000

MW KS Topeka 860

MW KS Wichita 900


MW KS 2760

MW OH Athens 590

MW OH 590

MW 3350

NE MA Boston 750

NE MA Worcester 1000

NE MA 1750

NE NY New York City 1800

NE NY Syracuse 600

NE NY 2400

NE 4150

SE FL Miami 1350

SE FL 1350

SE GA Atlanta 1600

SE GA Augusta 300

SE GA 1900

SE 3250

Another useful extension of group by is cube. The rollup extension sum-marizes at increasing levels of aggregation from left to right; in contrast, cube

summarizes at all possible levels of aggregation.

Example 5.17.15 A full aggregation can be achieved by using cube as in:


from grades

group by cube(cno,empno);

This will produce the following results:

CNO EMPNO COUNT(GRADE)

----- ----------- ------------

cs110 019 4

cs110 023 2

cs110 6

cs210 019 5

cs210 5

cs240 019 3

cs240 056 3

cs240 6

cs310 234 2

cs310 2

cs410 234 2

cs410 2

019 12

023 2

056 3

234 4

21


The order of aggregation of the attributes influences the presentation of theresult. For example, the query:

select empno,cno,count(grade)

from grades

group by cube(empno,cno);

will result in

EMPNO CNO COUNT(GRADE)

----------- ----- ------------

019 cs110 4

019 cs210 5

019 cs240 3

019 12

023 cs110 2

023 2

056 cs240 3

056 3

234 cs310 2

234 cs410 2

234 4

cs110 6

cs210 5

cs240 6

cs310 2

cs410 2

21

The totals computed by either of these “cubes” are shown in Figure 5.4.

Partial cube aggregations include group by clauses of the form

group by A1, . . . , Ak, cube (B1, . . . , Bℓ)

and compute total values of an aggregate function for all groups that can beobtained for values of A1, . . . , Ak and all combinations of values of B1, . . . , Bk.

Example 5.17.16 The partial cube aggregation:

select cno,empno,stno,count(grade) from grades

group by cno,cube(empno,stno)

yields the following result:

CNO EMPNO STNO COUNT(GRADE)

----- ----------- ---------- ------------

cs110 019 1011 1

cs110 019 2661 1

cs110 019 3566 1

cs110 019 5544 1

cs110 019 4

cs110 023 1011 1


019 023 056 234 empno

cs110

cs210

cs240

cs310

cs410

cno

Total for

employee

12 2 3 4

Total for

course

6

5

6

2

2

21

u

u

u

u

u

u

u2

2

33

5

4 2

Figure 5.4: Totals computed by the aggregate cube(cno,empno)


cs110 023 4022 1

cs110 023 2

cs110 1011 2

cs110 2661 1

cs110 3566 1

cs110 4022 1

cs110 5544 1

cs110 6

cs210 019 1011 1

cs210 019 2661 1

cs210 019 3566 1

cs210 019 4022 1

cs210 019 5571 1

cs210 019 5

cs210 1011 1

cs210 2661 1

cs210 3566 1

cs210 4022 1

cs210 5571 1

cs210 5

cs240 019 2415 1

cs240 019 3566 1

cs240 019 5571 1

cs240 019 3

cs240 056 1011 1

cs240 056 4022 1

cs240 056 5544 1

cs240 056 3

cs240 1011 1

cs240 2415 1

cs240 3566 1

cs240 4022 1

cs240 5544 1

cs240 5571 1

cs240 6

cs310 234 2661 1

cs310 234 4022 1

cs310 234 2

cs310 2661 1

cs310 4022 1

cs310 2

cs410 234 3442 1

cs410 234 5571 1

cs410 234 2

cs410 3442 1

cs410 5571 1

cs410 2

53 rows selected.


The grouping function allows us to identify those rows in a cube or rollup

that serve to summarize other rows and, therefore, contain “null” components.Namely, grouping(A) returns 1 for those A-compnents of rows that contain nullvalues and 0, otherwise.

Example 5.17.17 Consider again the cube query discussed in Example 5.17.15.The summarization query suppemented by the use of the function grouping:

select cno, empno, count(grade) as nogr,

grouping(cno) as c, grouping(empno) as e

from grades

group by cube(cno,empno)

returns the following results:

CNO EMPNO NOGR C E

----- ----------- ---------- ---------- ----------

cs110 019 4 0 0

cs110 023 2 0 0

cs110 6 0 1

cs210 019 5 0 0

cs210 5 0 1

cs240 019 3 0 0

cs240 056 3 0 0

cs240 6 0 1

cs310 234 2 0 0

cs310 2 0 1

cs410 234 2 0 0

cs410 2 0 1

019 12 1 0

023 2 1 0

056 3 1 0

234 4 1 0

21 1 1

17 rows selected.

In turn, we can use the grouping values and the having clause to retain onlycertain summary rows as in

select cno, empno, count(grade) as nogr,

grouping(cno) as c, grouping(empno) as e

from grades

group by cube(cno,empno)

having grouping(cno) = 1 or grouping(empno) = 1

This query returns:

CNO EMPNO NOGR C E

----- ----------- ---------- ---------- ----------

cs110 6 0 1

5.18 Analytical Capabilities of SQL Plus 131

cs210 5 0 1

cs240 6 0 1

cs310 2 0 1

cs410 2 0 1

019 12 1 0

023 2 1 0

056 3 1 0

234 4 1 0

21 1 1

5.18 Analytical Capabilities of SQL Plus

ORACLE includes enhancements to SQL called analytic functions that allow itto produce quite refined reports. These features reduce the need to use externalreporting tools and simplify statistical data analysis.

Analytical functions compute a value for each row of a query. These valuesare, in turn, based on a set of rows that is computed for each row and may beconsidered to appear in a sliding window. This set of rows is known as a window

and it is specified by the analytical clause, which is the parenthesizd expressionthat follows the reserved word over.

An example of the use of an analytic function (which we discuss in detailin Example 5.18.2) is the following query that computes a list of students, thenumber of courses they took, and their grade point average.

select STUDENTS.name, GA.noc as no_of_c, GA.gpa as gpa,

rank() over (partition by GA.noc

order by GA.gpa desc) as rank

from (select stno,

count(distinct cno) as noc,

avg(grade) as gpa

from GRADES

group by stno) GA, STUDENTS

where STUDENTS.stno = GA.stno

The function rank that we use in this query computes for each row a nu-merical rank starting from the content of the window.

The analytical clause used in the previous example indicates that the rowsretrieved by the query are partitioned based on the value of the number ofcredits (noc) and, then in each group the rows are ordered according to thevalues of the gpa attribute.

In general, the computation of the analytical clause is done after the com-putation of the from, where, group by, and having clauses.

Analytic functions are classified as shown in the table below:


FUNCTION CLASS USAGERanking Functions Calculating ranks, percentiles and n-tilesWindowing Functions Cumulative and moving averagesReporting Functions Calculating sharesLag/Lead Functions Finding a value in a row located a

specified number of rows from the current rowStatistical Functions Linear regression and other statistics

Processing involving analytic functions involves three phases:1. computation of products, selections, grouping, and having clauses;2. application of analytic functions to the resulting sets of rows;3. processing of the final order by clauses.The results of the first phase can be partitioned. Partitions are created after

the groups defined by the group by clauses and, thus, may use any aggregatefunctions such as sum, count, etc.

For each row in a partition, a sliding window of data may be defined. Thewindow determines a sequence of rows that is used to perform calculations forthe current row. Window sizes can be specified as numbers of rows or can bedetermined by intervals in a domain. Either end of a window or both ends canmove, depending on the definition of the window.

Each computation involving an analytic function is based on a current row .This row serves as reference for the ends of the window.

5.18.1 Ranking Functions

SQL Plus contains the ranking functions rank() and dense_rank() that canbe use to rank tuples in an order determined by certain attributes or expres-sions. Both functions generate ranks in either ascending or descending order,but dense_rank() does not leave gaps in rank numbers when a tie occurs. Thedefault order is, as usual, ascending order.

Example 5.18.1 To rank the grade records based on the grade obtained inany course we may write:

select stno, grade,

rank() over (order by grade)

from grades;


STNO GRADE RANK

---------- ---------- ----

1011 40 1

5571 50 2

4022 60 3

3442 60 3

2661 70 5

5544 70 5

4022 70 5


1011 75 8

4022 75 8

2661 80 10

5571 80 10

4022 80 10

5571 85 13

1011 90 14

1011 90 14

3566 90 14

3566 95 17

5544 100 18

2415 100 18

2661 100 18

3566 100 18

where the highest ranking is attributed to the grade record that involves thelowest grade. To reverse the ranking we write:

select stno, grade,

rank() over (order by grade desc)

from grades;

which yields:

STNO GRADE RANK

---------- ---------- ----

5544 100 1

3566 100 1

2415 100 1

2661 100 1

3566 95 5

1011 90 6

1011 90 6

3566 90 6

5571 85 9

2661 80 10

5571 80 10

4022 80 10

1011 75 13

4022 75 13

2661 70 15

5544 70 15

4022 70 15

4022 60 18

3442 60 18

5571 50 20

1011 40 21

Note that the first four grade records are tied for the first place; therefore,the record that follows the tied records has rank 5. With the dense_rank() all


four tied records will have rank 1 and the record that follows will have rank 2.This can be achieved by writing:

select stno, grade,

dense_rank() over (order by grade desc) as den_rank

from grades;

This query returns:

STNO GRADE DEN_RANK

---------- ---------- --------

5544 100 1

3566 100 1

2415 100 1

2661 100 1

3566 95 2

1011 90 3

1011 90 3

3566 90 3

5571 85 4

2661 80 5

5571 80 5

4022 80 5

1011 75 6

4022 75 6

2661 70 7

5544 70 7

4022 70 7

4022 60 8

3442 60 8

5571 50 9

1011 40 10

It is possible to use aggregate functions in computing rankings.Example 5.18.2 To rank the students in order of the number of courses theyhave taken we could write:

select STUDENTS.name, GA.noc as no_of_courses,

dense_rank() over (order by GA.noc desc) as den_rank

from (select stno, count(distinct cno) as noc

from grades


where STUDENTS.stno = GA.stno;

This generates the result:

NAME NO_OF_COURSES DEN_RANK

----------------------------------- --------

Prior Lorraine 4 1

Edwards P. David 3 2


Mixon Leatha 3 2

Pierce Richard 3 2

Lewis Jerry 3 2

Rawlings Jerry 2 3

Grogan A. Mary 1 4

Novak Roland 1 4

If we wish to rank the students based on the number of courses and, then,at an equal number of courses, to rank them in the order of the grade pointaverage, we could write the following query:

select STUDENTS.name, GA.noc as no_of_c, GA.gpa as gpa,

rank() over (partition by GA.noc

order by GA.gpa desc) as rank

from (select stno,

count(distinct cno) as noc,

avg(grade) as gpa

from GRADES


where STUDENTS.stno = GA.stno

The partition by option establishes groups of equal GA.noc value, and then itranks the record in each such group using the order by clause. The result ofthis query is:

NAME NO_OF_C GPA RANK

------------------------ ---------- ----------

Grogan A. Mary 1 100 1

Novak Roland 1 60 2

Rawlings Jerry 2 85 1

Pierce Richard 3 95 1

Mixon Leatha 3 83.33 2

Edwards P. David 3 73.75 3

Lewis Jerry 3 71.66 4

Prior Lorraine 4 71.25 1

8 rows selected.

In general, the expression in the partition by clause divides the set of rowsthat results from the query in groups and the rank() function operates withinthese groups; in other words, rank() is reset when the defining expression of thegroup changes. The order by clause attached to the rank specifies the rankingcriterion and the order of the rows in each group.

5.18.2 Top-n Queries

Top-n queries ask for the n largest or smallest values of a column. Such queriesare solved in ORACLE using the pseudo-attribute ROWNUM which assigns a value


starting with 1 to each of the rows returned by a subquery. Thus, a top-n queryin SQL Plus requires the following elements:

1. a subquery containing the order by clause that ensures that the rowsretrieved by the subquery are placed in the proper order;

2. the main query that includes the ROWNUM pseudo-attribute and may includea where clause to specify the number of returned rows.

Example 5.18.3 To retrieve the top three students in the order of their gradepoint averages we write:

select ROWNUM as rank, name, avgg from

(select STUDENTS.stno, STUDENTS.name, avg(grade) as avgg

from STUDENTS, GRADES

where STUDENTS.stno = GRADES.stno

group by STUDENTS.stno, STUDENTS.name

order by avg(grade) desc)

where ROWNUM <= 3

This will return:

RANK NAME AVGG

------ --------------- -------

1 Grogan A. Mary 100

2 Pierce Richard 95

3 Rawlings Jerry 85

To retrive the bottom-3 students all we need to do is to invert the orderingin the subquery. This can be achieved by either replacing desc with asc, or byomitting desc altogether (since the default is asc). Thus, the phrase:

select ROWNUM as rank, name, avgg from

(select STUDENTS.stno, STUDENTS.name, avg(grade) as avgg




order by avg(grade))

where ROWNUM <= 3;

will yield:

RANK NAME AVGG

---- --------------- -----

1 Novak Roland 60

2 Prior Lorraine 71.25

3 Lewis Jerry 71.67

Example 5.18.4 Ties between rows may eliminate rows that we would expectto see in results of our queries. The next query


select STUDENTS.stno, STUDENTS.name,

count(distinct cno) as noc




order by count(distinct cno) desc;

lists students in decreasing order of the number of courses they took:

STNO NAME NOC

---------- ---------------------

4022 Prior Lorraine 4

1011 Edwards P. David 3

2661 Mixon Leatha 3

3566 Pierce Richard 3

5571 Lewis Jerry 3

5544 Rawlings Jerry 2

2415 Grogan A. Mary 1

3442 Novak Roland 1

To retrieve the first four students among the ones who took the largestnumber of courses we write:

select ROWNUM as rank, name, noc

from (select STUDENTS.stno, STUDENTS.name,

count(distinct cno) as noc




order by count(distinct cno) desc)

where ROWNUM <= 4

Note that the result returned by this query:

RANK NAME NOC

---------- ---------------

1 Prior Lorraine 4

2 Edwards P. David 3

3 Mixon Leatha 3

4 Pierce Richard 3

eliminates the student ’Lewis Jerry’.

A more complicated example involves using two subquery rankings.Example 5.18.5 Suppose that we need to find, as above, the top three stu-dents; in addition, we need to find for each of these students their ranking fromthe point of view of the number of courses they took. This can be done usingthe query:

select ROWNUM as gr_rank, name, c_rank from

(select name, avgg, ROWNUM as c_rank from

(select name, avg(grade) as avgg, count(distinct cno) as nc

from STUDENTS S, GRADES G

where S.stno = G.stno


group by S.stno,S.name

order by nc desc)

order by avgg desc)

where ROWNUM <= 3

which will return:

GR_RANK NAME C_RANK

------- -------------- ------

1 Grogan A. Mary 7

2 Pierce Richard 4

3 Rawlings Jerry 6

5.18.3 Windowing functions in SQL Plus

Windowing functions are used in SQL Plus to compute cumulative, moving, andother aggregate functions applied to a set of tuples called a window. The sizeand shape of the window is always defined relative to a row in a block; thisreference row is called the current row.

Aggregate functions that can be used include sum, avg, min, max, statis-tical functions (discussed in Section 5.19), as well as two special functions,first value and last value that return the first and last values in a win-dow.

Example 5.18.6 To compute the evolution of the grade average for each stu-dent as he or she advances towards graduation, we can write a query that returnsthe cumulative average for each student for the sequence of semesters when thestudent is active:

select stno, year, sem,

avg(grade) over (partition by stno

order by year, sem desc

rows unbounded preceding) as ag

from grades

order by stno, year, sem desc;

This will return:

STNO YEAR SEM AG

---------- ---------- ------ ----------

1011 2002 FALL 40

1011 2003 SPRING 57.50

1011 2003 FALL 68.33

1011 2004 SPRING 73.75

2415 2003 SPRING 100

2661 2002 FALL 80

2661 2003 FALL 75

2661 2004 SPRING 83.33

5.19 Statistics in SQL 139

3442 2003 SPRING 60

3566 2002 FALL 95

3566 2003 SPRING 97.5

3566 2003 FALL 95

4022 2003 SPRING 60

4022 2004 SPRING 65

4022 2004 SPRING 70

4022 2004 SPRING 71.25

5544 2002 FALL 100

5544 2004 SPRING 85

5571 2003 SPRING 50

5571 2003 SPRING 65

5571 2004 SPRING 71.67

The words unbound preceding mean that the window over which we computethe grade average extends to all rows that involve the same student and precedethe current row.

The syntax of the windowing functions is:

aggregate function (value expression | *)over ([partition byvalue expression{,value expression}]order by value expression [collate clause]

[asc |desc] [nulls first |nulls last]{,value expression [collate clause][asc |desc] [nulls first |nulls last}[rows | range][[unbounded preceding | value expression preceding] |between [unbounded preceding | value expression preceding]and〈current row | value expression following

5.19 Statistics in SQL

SQL Plus is equipped with a large collection of statistical functions which wediscuss in this section. These function are incorporated in the newest SQLstandard, SQL2003.

5.19.1 Variance and Correlation

Population and sample variance can be computed using the functions var pop

and var samp, respectively. Both functions take an attribute as argument andapply to the remaining non-null values. If the sequence of values of an attributeA is (x1, . . . , xn), then the population variance is:

var pop(A) =

∑n

i=1(xi − x̄)2

n=

n∑n

i=1 x2i − (

∑n

i=1 xi)2

n2,


and the sample variance is:

var samp(A) =

∑n

i=1(xi − x̄)2

n − 1=

n∑n

i=1 x2i − (

∑n

i=1 xi)2

n(n − 1),

where x̄ =∑

n

i=1 xi

n. As it is shown in statistics, the sample variance is an

unbiased estimator of the theoretical variance.

Example 5.19.1 To determine the population variance for the grade popula-tion of each student we group the records of GRADES on the student numberand then compute the population variance for each group. This is done by thefollowing select phrase:

select stno, var_pop(grade)

from GRADES

group by stno;

which returns:

STNO VAR_POP(GRADE)

---------- --------------

1011 417.18

2415 0

2661 155.55

3442 0

3566 16.66

4022 54.68

5544 225

5571 238.88

Similarly, the sample variance of the same populations is computed by:

select stno, var_samp(grade)

from GRADES

group by stno;

which yields:

STNO VAR_SAMP(GRADE)

---------- ---------------

1011 556.25

2415

2661 233.33

3442

3566 25

4022 72.91

5544 450

5571 358.33

8 rows selected.

To compute the population variance grade over the entire GRADES table wewrite:


select var_pop(grade)

from GRADES;

which gives:

VAR_POP(GRADE)

--------------

275.283447

A similar select

select var_samp(grade)

from GRADES;

produces the sample variance for the entire table:

VAR_SAMP(GRADE)

---------------

289.047619

If the set of values of the sample contains one value, then the functionvar samp returns a null value. This is the case in the query:

select var_samp(grade)

from GRADES

where stno= ’1011’ and cno = ’cs110’

and year = 1999;

which yields:

VAR_SAMP(GRADE)

---------------

On another hand, a similar function called variance returns 0 whenever thepopulation contains a single value; otherwise, variance returns the samplevariance. For instance, the query

select variance(grade)

from GRADES

where stno =’1011’ and cno = ’cs110’

and year =1999;

returns:

VARIANCE(GRADE)

---------------

0

The population standard deviation and the sample standard deviation that arethe square roots of the population and the sample variance, respectively, can becomputed using the functions stddev pop and stddev samp, respectively.Example 5.19.2 To compute the population standard deviation of the set ofvalues of the grade for each student we write:


select stno, stddev_pop(grade)

from GRADES

group by stno;

This yields the following answer:

STNO STDDEV_POP(GRADE)

---------- -----------------

1011 20.42

2415 0

2661 12.47

3442 0

3566 4.08

4022 7.39

5544 15

5571 15.45

8 rows selected.

Similarly, the sample standard deviation can be obtained by:

select stno, stddev_samp(grade)

from GRADES

group by stno;

which generates:

STNO STDDEV_SAMP(GRADE)

---------- ------------------

1011 23.58

2415

2661 15.27

3442

3566 5

4022 8.53

5544 21.21

5571 18.92

8 rows selected.

The population and the sample covariances between the values that appearunder the attributes T.A and S.B are computed using the functions covar popand covar samp, respectively, as in the following select phrases:

select covar_pop(T.A,S.B) from T,S where T.C = S.D;

select covar_samp(T.A,S.B) from T,S where T.C = S.D;

Example 5.19.3 The table sstudy contains whose creation was described inAppendix B records the number of hours slept during three successive nightsby a group of students. To determine the population covariance between theaverage number of hours slept and the grade point average of the students wewrite:


select covar_pop(g.avggrade, s.avghours)

from (select stno, avg(grade) as avggrade

from GRADES

group by stno) g,

(select stno, avg(no_hours) as avghours

from SSTUDY

group by stno) s

where g.stno = s.stno;

This will return the answer:

COVAR_POP(G.AVGGRADE,S.AVGHOURS)

--------------------------------

11.2673611

Similarly, to compute the sample covariance we use the query

select covar_samp(g.avggrade, s.avghours)


from GRADES

group by stno) g,


from SSTUDY

group by stno) s


which produces the result:

COVAR_SAMP(G.AVGGRADE,S.AVGHOURS)

---------------------------------

12.8769841

Correlations are computed using the function corr.Example 5.19.4 The correlation coefficient between the grade point averageand the average number of hours slept is computed by:

select corr(g.avggrade, s.avghours)


from GRADES

group by stno) g,


from SSTUDY

group by stno) s


The answer is:

CORR(G.AVGGRADE,S.AVGHOURS)

---------------------------

.961293724


5.19.2 Linear Regression

Regression is a supervised learning activity by which we seek to identify thelink that exists between input and output data of an experiment starting froma sequence of inputs and the corresponding observations of the outputs. If weattempt to find this link as a linear function, then we apply linear regression.

Suppose that the input data is x1, · · · , xn and the corresponding outputsequence is y1, . . . , yn and we seek to determine the linear function f(x) = ax+b

such that values yi are as close as possible to axi + b for 1 ≤ i ≤ n. This isachieved by minimizing the total square error given by:

E =n

∑

i=1

(yi − (axi + b))2 .

It is possible to show that the minimum of E is achieved when:

a =n

∑

xiyi −∑

xi

∑

yi

(n∑

x2i −

∑

xi)2

b =

∑

yi

∑

x2i −

∑

xi

∑

xiyi

(n∑

x2i −

∑

xi)2 .

Thus, we obtain the regression line y = ax + b, where a is the slope and b isthe intercept. These numbers are computed by the functions regr slope andregr intercept, respectively. Both take as arguments the averages of the x-sequence and the y-sequence. The quality of the regression line obtained can beevaluated using the goodness of fit regr r2 which takes the same arguments asthe functions mentioned above.

Example 5.19.5 To compute the regression parameters for the sequences ofaverage grades and the sequence of average hours of nightly sleep for all studentswe write:

select regr_count(g.avggrade, s.avghours) as rc,

regr_avgx(g.avggrade, s.avghours) as avgx,

regr_avgy(g.avggrade, s.avghours) as avgy,

regr_slope(g.avggrade, s.avghours) as slope,

regr_intercept(g.avggrade, s.avghours) as interc,

regr_r2(g.avggrade, s.avghours) as gof


from GRADES

group by stno) g,


from SSTUDY

group by stno) s


This query returns the following result:

RC AVGX AVGY SLOPE INTERC GOF

---------- ---------- ---------- ---------- ---------- ----------

8 7.08333333 80 12.7755906 -10.493766 .924085625

5.20 Graphs in SQL Plus 145

e

e

e

e

e

?

s

3

e

e

?

?

6

7

w0

1

3

2

4

5

63

Figure 5.5: Drawing of the graph G

5.20 Graphs in SQL Plus

Graphs represent binary relations on sets, in the sense of the following definition.A graph is defined as a pair of sets G = (V, E), where V is the set of vertices of

G and E ⊆ V ×V is the set of edges of G. Clearly, E is a binary relation on V .If (u, v) ∈ E, we say that u is origin of the edge (u, v) and v is destination

of the same edge. A graph can be drawn by representing the vertices by pointsand edges by arrows. Namely, if (u, v) is an edge, we draw in arrow that beginsat u and ends at v.

Example 5.20.1 Consider the graph G = (V, E), where V = {0, 1, 2, 3, 4, 5, 6}and E = {(0, 1), (0, 3), (1, 2), (2, 5), (2, 6), (3, 4), (3, 6), (4, 5), (5, 6)}. This graphis drawn in Figure 5.5.

Graphs can be represented by tables that have the heading origin destina-

tion. Each edge (u, v) corresponds to a pair in the table. Clearly, for any graphthe corresponding table contains the same information as the graph.

Example 5.20.2 The graph of Example 5.20.1 is represented by the table:

GRAPH

origin destination

0 10 31 22 52 63 43 64 55 6

To create this table use the script included in Appendix F.

A path in the graph G = (V, E) joins v0 to vn is a sequence of vertices (v0, v1, . . . , vn)such that (vi, vi+1) is an edge in G for 0 ≤ i ≤ n − 1. We refer to v0 as the

origin of the path and to vn as the destination of the path. The number n is the


length of the path. A path that begins and ends in the same vertex is a cycle

or a loop. If a graph has no cycles, then we say that the graph is acyclic. Notethat the graph defined in Example 5.20.1 is acyclic.

We write (u, v) ∈ E+ if there exists a path of length at least 1 that has u asits origin and v as its destination. The relation E+ is transitive closure of therelation E.

Example 5.20.3 The transitive closure of the relation E defined by the graphof Example 5.20.1 consists of the following pairs:

(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6),(1, 2), (1, 5), (1, 6), (2, 5), (2, 6), (3, 4),(3, 5), (3, 6), (4, 5), (4, 6)

as can be easily seen by inspecting Figure 5.5.

Of course, the transitive closure E+ of a relation E ⊆ V × V is itself a relationon the set V and, therefore, it can also be represented as a table. Namely, thetabular representation of E+ is:

GRAPHPLUS

origin destination

0 10 20 30 40 50 61 21 51 62 52 63 43 53 64 54 6

It is possible to prove that the table GRAPHPLUS can not be computedthrough the operations of the relational algebra. (see [Maier, 1983], for exam-ple). However, SQL Plus allows us to compute the table GRAPHPLUS, when theunderlying graph G is acyclic.

This is accomplished using the clause connect by of select. This clauseestablishes links between tuples of a table and can be used to retrieve the verticesof a graph that can be accessed by paths that start from a certain vertex. Itssyntax is defined by:

[start with condition ] connect by condition

For example, the chaining condition of the nodes in a path of a graph isdescribed by connect by origin = prior destination. Thus, to obtain theset of vertices that are accessible from the vertex 4 in the graph shown inFigure 5.5 we write:


select distinct destination from graph

start with origin = 4

connect by origin = prior destination;

This will yield the result

DESTINATION

-----------

5

6

consistent with the structure of the graph.

The connect by clause cannot be applied to graphs that contain cycles. Ifthis is the case, ORACLE detects the existence of loop and returns an errormessage.

Example 5.20.4 Suppose that we add an edge to the graph shown in Fig-ure 5.5 that creates a loop, for example, the edge (5, 0), which creates the loop(0, 3, 4, 5, 0). This can be done by

insert into tree(origin, destination)

values(5,0);

Now, if we try the query

select distinct destination from graph



we obtain the error message:

ORA-01436: CONNECT BY loop in user data

In Chapter 8 we discuss an algorithm that can be used for compute the transitiveclosure for arbitrary graphs (with or without loops).

If a data set has a hierarchical structure, then it can be described by a rootedtree, that is, by a special acyclic graph G = (V, E) that has a distinguishedvertex v0 called root such that for every other vertex v of the graph there isa unique path that joins v0 to v. It is not difficult to show that for any twodistinct vertices u, v of a rooted tree there exists at most one path that joins u

to v. If such a path exists then we say that v is a descendant of u.

Example 5.20.5 The option connect by of SQL Plus can be used to find thedescendants of a vertex in a rooted tree. Consider, for example the rooted treeshown in Figure 5.6. The table that represents this tree is created by the SQLscript included in Appendix F and has the form:


f

f f f

f ff f fff

f f

? s�

U� ?w/ U�

U�

0

1 2 3

4 5 6 8 9 107

1211

Figure 5.6: Rooted Tree

TREE

origin destination

0 10 20 31 41 52 62 72 83 93 107 117 12

To retrieve the all descendants of a vertex (in this case, of vertex 2) we write:

select distinct destination as DESCENDANTS from tree



This returns:

DESCENDANTS

-----------

6

7

8

11

12

On the other hand, to retrieve the ancestors of a vertex, that is all verticesthat occur between the root of the tree and a vertex we write:

select distinct origin as ANCESTORS from tree

start with destination = 12

connect by destination = prior origin;

This will retrieve the table:


ANCESTORS

----------

0

2

7

The reserved word prior can be used on either side of the equality sign. Forexample, the last query of Example 5.20.5 can be written as:

select distinct origin ANCESTORS from tree


connect by prior origin = destination;

The pseudo-attribute LEVEL can be used to indicate the length of the paththat begins at the starting vertex of the query and ends with the vertex currentlyretrieved.Example 5.20.6 The following query adds the pseudo-attribute LEVEL to thequery of Example 5.20.5 that retrieves the descendants of the vertex 2:

select distinct level, destination as DESCENDANTS from tree


connect by origin = prior destination


LEVEL DESCENDANTS

---------- -----------

1 6

1 7

1 8

2 11

2 12

Observe that the immediate descendants are at level 1 and the next level ofdescendants at level 2.

If we retrieve the ancestor of a node as in

select distinct level, origin as ANCESTORS from tree


connect by destination = prior origin;

the values of LEVEL reflects the distance (in number of edges) between the vertexand its various ancestors:

LEVEL ANCESTORS

---------- ----------

1 7

2 2

3 0


Example 5.20.7 Combining the string function lpad and the pseudo-attributeLEVEL allows us to display the entire tree using indentations. The query:

select level,lpad(’*’,2 * level -1)||destination as vertex from tree

start with origin = ’0’

connect by prior destination = origin;

returns the following display of the tree structure:

LEVEL VERTEX

---------- ---------

1 *1

2 *4

2 *5

1 *2

2 *6

2 *7

3 *11

3 *12

2 *8

1 *3

2 *9

2 *10

An alternate way for obtaining a description of a tree that shows the pathsthat can be used to reach vertices can be obtained using two pseudo-attributesCONNECT BY ISLEAF and SYS CONNECT BY PATH. CONNECT BY LEAF returns 1 ifthe current vertex, (in our case, the destination) of the edge is a leaf and 0,otherwise. For every edge of the path that joins the starting vertex to the currentnode the pseudo-attribute SYS CONNECT BY PATH computes a string specified byits first argument; entries between successive edges are separated by the stringspecified by its second argument.Example 5.20.8 The query:

select level,destination,

CONNECT_BY_ISLEAF "IsLeaf?",

SYS_CONNECT_BY_PATH(’(’||origin||’,’||destination||’)’,’+’) "Path"

from tree

start with origin = ’0’

connect by prior destination = origin

order by level

will return:

LEVEL DE IsLeaf? Path

1 1 0 +(0,1)

1 2 0 +(0,2)

1 3 0 +(0,3)

2 4 1 +(0,1)+(1,4)

2 7 0 +(0,2)+(2,7)

2 6 1 +(0,2)+(2,6)

5.21 Updates in SQL 151

2 10 1 +(0,3)+(3,10)

2 9 1 +(0,3)+(3,9)

2 8 1 +(0,2)+(2,8)

2 5 1 +(0,1)+(1,5)

3 11 1 +(0,2)+(2,7)+(7,11)

3 12 1 +(0,2)+(2,7)+(7,12)

5.21 Updates in SQL

There are three constructs in SQL that allow us to update the tables of arelational database: update, insert, and delete.

The update construct modifies components of tuples. It applies to all tuplesof the specified table unless limited by a where clause.

Example 5.21.1 Recall the table EMPHIST introduced in Example 3.3.5. Ascript to create and populate the tables discussed in that example is containedin the script ced.sql that is available in Appendix C.

To give all current employees a 10% raise, we apply the following updatephrase:

update EMPHIST

set salary = 1.1* salary

where term_date is null;

The general syntax of update is:update table name [corr name]

set column = 〈expression|null〉 {,column = 〈expression|null〉}[wherecondition]

The insert construct adds new rows to a table. It inserts a single rows(whose components must be specified by the user) or a set of rows that originatefrom a retrieval involving other tables.

The syntax of a single-tuple insert is:insert into table name[(column{, column}]

〈values(expr {, expr})|subselect〉The values of expressions listed in the list of values must belong to the domainsof the attributes specified in the list of columns in order for the insertion to takeeffect.

Example 5.21.2 To insert two rows containing registration records for stu-dent 2890 for the fall semester of 2004 into GRADES, we execute two insert

statements:

insert into GRADES

values (’2890’,’023’,’cs110’,’Fall’,2004,null);

insert into GRADES

values (’2890’,’056’,’cs240’,’Fall’,94,null);


The syntax of the insertion of a set of tuples obtained by a retrieval operationis:

insert into table name [(column{, column}]select phrase

Here the select phrase must return tuples of values consistent with thedomains of the attributes specified by the list of columns [(column{, column}].

Example 5.21.3 Suppose that we intend to have a separate table indicatingthe assignments of instructors. After creating such a table (called ASSIGN andequipped with the attributes empno, cno, sem, and year) by writing:

create table ASSIGN(empno varchar2(11) not null,

cno varchar2(5) not null,

sem varchar2(6) not null,

year smallint);

we can load this table using data from the existing table GRADES using theconstruct:

insert into ASSIGN(empno, cno, sem, year)

select distinct empno, cno, sem, year

from GRADES;

This results in the following table:empno cno sem year

019 cs110 Fall 2001

019 cs210 Fall 2002

019 cs210 Spring 2003

019 cs240 Spring 2002

023 cs110 Spring 2002

056 cs240 Spring 2003

234 cs310 Spring 2003

234 cs410 Spring 2002

If the components of the tuple to be inserted into a table violate the decla-ration of the table (e.g., a null value for a not null attribute, or a characterstring for a numerical attribute), the DBMS should reject the insertion.

Likewise, the delete construct deletes rows of tables.

Example 5.21.4 To delete the rows of the table ASSIGN that correspond tocourse taught by the instructor whose employee number is ’234’, we write:

delete from ASSIGN where empno = ’234’;

The directive:

delete from GRADES where grade is null;

eliminates the rows whose grade component is null.

The where clause of delete is optional; if this clause is not used, then allrows are deleted. The tabular variable still exists.

5.22 Access Rights 153

Example 5.21.5 The following delete eliminates all rows of the table ASSIGN:

delete from ASSIGN;

The syntax of delete is:

delete from table name [wherecondition]

5.22 Access Rights

The grant operation assigns access rights to users. To delegate access rights toother users, a user must “own” these rights. The set of access rights includesselect, insert, update, and delete and refers to the right of executing eachof these operations on a table. Further, update can be restricted to specificcolumns.

All these access rights are granted to the creator of a table automatically.The creator, in turn, may grant access rights to other users or to all users(designated in SQL as public). The SQL standard envisions a mechanism thatcan limit the excessive proliferation of access rights. Namely, a user may receivethe select right with or without the right to grant this right to others by hisown action.

Example 5.22.1 Suppose that the user alex owns the table COURSES andintends to grant this right to the user whose name is peter. The user alex canaccomplish this by

grant select on COURSES to peter

Now, peter has the right to query the table COURSES but he may not propagatethis right to the user ellie. In order for this to happen, alex would have touse the directive:

grant select on COURSES to peter

with grant option

Example 5.22.2 If peter owns the table STUDENTS, then he may delegatethe right to query the table and the right to update the columns addr, city andzip to ellie using the directive:

grant select, update(addr, city, zip) on

STUDENTS to ellie

The standard syntax of grant is:


grant {priv{,priv} | all [privileges]}on [table] tablename{, tablename}to 〈username{, username}|public〉[with grant option]

Here priv has the syntax:

〈select|insert|delete|update[(attribute{, attribute})]〉

Privileges can be revoked using the revoke construct, which is a featureof standard SQL. For instance, if peter wishes to revoke ellie’s privileges toupdate the table STUDENTS, he may write:

revoke update(addr,city,zip) on

STUDENTS from ellie

The standard syntax for this directive isrevoke {priv{,priv}|all [privileges]}

on [table] tablename{, tablename}from 〈username{, username}|public〉

5.23 Views in SQL

Views are virtual tabular variables. This means that in SQL a view is referencedfor retrieval purposes in exactly the same way a tabular variable is referenced.The only difference is that a view does not have a physical existence. It existsonly as a definition in the database catalog. We refer to “real” tabular variables(that is, the tabular variables that have a physical existence in the database) asbase tabular variables.

Views are supported in both SQLPlus and in Transact SQL but not in thecurrent version (4.1) of MySQL.

To illustrate the notion of view, let us consider the following example.Example 5.23.1 Suppose that we write:

create view STC as

select STUDENTS.name, GRADES.cno


where STUDENTS.stno = GRADES.stno;

The select construct contained by this create view retrieves all pairs ofstudent names and course numbers such that the student whose name is s hasregistered for the course c.

When this directive is executed by SQL, no data retrieval takes place. Thedatabase system simply stores this definition in its catalog. The definition of theview STC becomes a persistent object, that is, an object that exists after ourinteraction with the DBMS has ceased. From a conceptual point of view, theuser treats STC exactly like any other tabular variable. Suppose, for instancethat we wish to retrieve the names of students who took cs110. In this case itis sufficient to write the query:

5.23 Views in SQL 155

select name from STC where cno =’cs110’;

In reality, SQL combines this select phrase with the query just shown andexecutes the modified query:

select STUDENTS.name from STUDENTS, GRADES


and GRADES.cno =’cs110’;

The previous example shows that views in SQL play a role similar to the roleplayed by macros in programming languages.

Views are important for data security. A user who needs to have access onlyto list of names of students and the courses they are taking needs to be awareonly of the existence of STC. If the user is authorized to use only select con-structs, then the user can ignore whether STC is a table or a view. Confidentialdata (such as grades obtained in specific courses) can be completely protectedin this manner. Also, the queries that this limited-access user may write aresimpler and easier to understand. No space is wasted with the view STC, andthe view remains current always, reflecting the contents of the tabular variablesSTUDENTS and GRADES.

SQL treats views exactly as it treats the tabular variables as far as retrieval

is concerned. We can also delegate the select privilege to a view in exactlythe same way as we did for a tabular variable. For instance, if the user georgecreated the view STC, then he can give the select right to vanda by writing:

grant select on STC to vanda;

Consider now another example of view:Example 5.23.2 The view SNA that contains the student number and thenames of students can be created by:

create view SNA as

select stno, name from STUDENTS

The purpose of this view is to insure privacy to students. Any user who hasaccess only to this view can retrieve the student number and name of a student,but not the address of the student.

There is a fundamental difference between the views introduced in Exam-ples 5.23.1 and 5.23.2, and this refers to the ways in which these two viewsbehave with respect to updates.

Suppose that the user wishes to insert the pair (7799, ’Jane Jones’) in theview SNA. The user may ignore entirely the fact that SNA is not a base tabularvariable. On the other hand, the effect on the base tabular variable of thisinsertion is unequivocally determined: the system inserts in the tabular variableSTUDENTS the tuple (7799, ’Jane Jones’, null, null, null). On the other hand,we cannot insert a tuple in a meaningful way in the view STC introduced inExample 5.23.1. Indeed if we attempt to insert a pair (s, c) in STC, then we haveto define the effect of this insertion on the base tabular variable. This is clearly


impossible: we do not know what the student number is, what the identificationof the instructor is, etc. SQL forbids users to update views based on more thanone table (as STC is). Even if such updates would have an unambiguous effecton the base tabular variable, this rule rejects any such update. Only some viewsbased on exactly one tabular variable can be updated. It is the responsibilityof the database administrator to grant to the user the right to update a viewonly if that view can be updated.

If a view can be updated, then its behavior is somewhat different from thebase tabular variable on which the view is built. An update made to a viewmay cause one or several tuples to vanish from the view, whenever we retrievethe tuples of the view.

Example 5.23.3 Consider the view uppergr defined by:

create view UPPERGR as

select * from GRADES where grade > 75;

If we wish to examine the tuples that satisfy the definition of the view we usethe construction:

select * from UPPERGR;

that returns the result:


---------- ----------- ----- ------ ---------- ----------

2661 019 cs110 FALL 1999 80

3566 019 cs110 FALL 1999 95

5544 019 cs110 FALL 1999 100

3566 019 cs240 SPRING 2000 100

2415 019 cs240 SPRING 2000 100

5571 234 cs410 SPRING 2000 80

1011 019 cs210 FALL 2000 90

3566 019 cs210 FALL 2000 90

5571 019 cs210 SPRING 2001 85

1011 056 cs240 SPRING 2001 90

4022 056 cs240 SPRING 2001 80

2661 234 cs310 SPRING 2001 100

The update construction:

update UPPERGR

set grade = 70

where stno = ’2661’ and empno = ’019’ and cno = ’cs110’

and sem = ’FALL’ and year = 1999;

makes the first row disappear, since it no longer satisfies the definition of theview. Indeed, if we use again the same query on UPPERGR, we obtain:


---------- ----------- ----- ------ ---------- ----------

3566 019 cs110 FALL 1999 95

5544 019 cs110 FALL 1999 100

5.23 Views in SQL 157

3566 019 cs240 SPRING 2000 100

2415 019 cs240 SPRING 2000 100

5571 234 cs410 SPRING 2000 80

1011 019 cs210 FALL 2000 90

3566 019 cs210 FALL 2000 90

5571 019 cs210 SPRING 2001 85

1011 056 cs240 SPRING 2001 90

4022 056 cs240 SPRING 2001 80

2661 234 cs310 SPRING 2001 100

To reestablish the previous content of GRADES, we can use the update:

update UPPERGR

set grade = 80

where stno = ’2661’ and empno = ’019’ and cno = ’cs110’

and sem = ’FALL’ and year = 1999;

The standard syntax of create view allows us to use the clause with check

option. When this clause is used, every insertion and update done through theview is verified to make sure that a tuple inserted through the view actuallyappears in the view and an update of a row in the view does not cause the rowto vanish from the view.

The syntax of create view is:

create view view as

subselect

[with check option]

A view V can be dropped from a database by using the construct

drop view V;

If we drop a tabular variable from the database, then all views based on thattable are automatically dropped; if we drop a view, then all other views thatuse the view that we drop are also dropped.

Views are useful instruments in implementing generalizations. Suppose, thatwe began the construction of the college database from the existing tabularvariables UNDERGRADUATES and GRADUATES that modelled sets of entitieshaving the same name, where

heading(UNDERGRADUATES) = stno name addr city state zip major

heading(GRADUATES) = stno name addr city state zip qualdate

Then, the tabular variable STUDENTS could have been obtained as a viewbuilt from the previous two base tabular variables by

create view STUDENTS as

select stno name addr city state zip

from UNDERGRADUATES

union


user catalogTABLE NAME TABLE TYPESTUDENTS TABLEINSTRUCTORS TABLECOURSES TABLEGRADES TABLEADVISING TABLE

select stno name addr city state zip

from GRADUATES

5.24 Accessing metadata in SQLPlus

The catalog of ORACLE is a very large tabular variable that can be accessedthrough several views defined on this table.

In ORACLE a list of the table owned by the current user is contained bythe view user catalog, also accessible through its synonym cat. A content of thisview is shown in Figure 5.24.

Information that describes space allocation and statistical properties can befound in the view named USER TABLES, also named TABS. A description ofthe attributes of tabular variables and of their domains can be found in the viewUSER TAB COLUMNS also accessible as COLS. For example, the query:

select table_name,column_name,data_type from COLS;

results in the following table:

TABLE_NAME COLUMN_NAME DATA_TYPE

ADVISING STNO CHAR

ADVISING EMPNO CHAR

COURSES CNO CHAR

COURSES CNAME CHAR

COURSES CR NUMBER

GRADES STNO CHAR

GRADES EMPNO CHAR

GRADES CNO CHAR

GRADES SEM CHAR

GRADES YEAR NUMBER

GRADES GRADE NUMBER

INSTRUCTORS EMPNO CHAR

INSTRUCTORS NAME CHAR

INSTRUCTORS RANK CHAR

INSTRUCTORS ROOMNO NUMBER

INSTRUCTORS TELNO CHAR

STUDENTS STNO CHAR

STUDENTS NAME CHAR

5.25 Exercises 159

STUDENTS ADDR CHAR

STUDENTS CITY CHAR

STUDENTS STATE CHAR

STUDENTS ZIP CHAR

A more complete list of objects that belong to the current user can be foundin the view USER OBJECTS which lists all objects created by the user, includingthose mentioned in USER CATALOG, as well as other useful information (suchas the date of creation, the last time when the object was affected by a datadefinition statement, the status of the object, etc.)

The definition of views can be accessed by the USER VIEWS catalog view.Example 5.24.1 The meta-view (view about views) USER VIEWS has thestructure described below:

Name Null? Type

------------------------------- -------- --------------

VIEW_NAME NOT NULL VARCHAR2(30)

TEXT_LENGTH NUMBER

TEXT LONG

TYPE_TEXT_LENGTH NUMBER

TYPE_TEXT VARCHAR2(4000)

OID_TEXT_LENGTH NUMBER

OID_TEXT VARCHAR2(4000)

VIEW_TYPE_OWNER VARCHAR2(30)

VIEW_TYPE VARCHAR2(30)

SUPERVIEW_NAME VARCHAR2(30)

The last six attributes are important for object views discussed in Chapter 7.To extract the definition of the view UPPERGR defined above we write:

select text from user_views where view_name=’UPPERGR’;

This query returns the result:

TEXT

------------------------------------------------------------

select "STNO","EMPNO","CNO","SEM","YEAR","GRADE" from GRADES

where grade > 75

5.25 Exercises

1. Solve the following queries in SQL:(a) Find all students who live in Malden or Newton.(b) Find all students whose name starts with ’F’;(c) Find all students whose name contains the letter ’f’;

2. A select phrase equivalent to the union-computing select


union



is

select stno from grades

where cno = ’cs210’ or cno = ’cs210’;

(a) Write an equivalent query using the in operator.(b) Can you transform the intersection-computing select:


intersect


into a single select? Explain your answer.

Solve in SQL the following queries that refer to the college database:

3. Find cities where students live for all students who dot not live in Boston,Massachusetts.

4. Find all pairs of student names and course names for grades obtainedduring Fall of 2001.

5. Find the names of students who took some four-credit courses.6. Find the names of students who took a course with an instructor who is

also their advisor.7. Find the names of students who took cs210 or had Prof. Smith as their

advisor.8. Find all pairs of names of students who live in the same city.9. Find all triples of instructors’ names for instructors who taught the same

course.10. Find instructors who taught students who are advised by another instruc-

tor who shares the same room.11. Find course numbers of courses taken by students who live in Boston and

which are taught by an associate professor.12. Find the names of instructors who teach courses attended by students who

took a course with an instructor who is an assistant professor.13. Find the telephone numbers of instructors who teach a course taken by

any student who lives in Boston.14. Find all pairs of names of students and instructors such that the student

never took a course with the instructor.15. Find the names of students who took no four-credit courses.16. Find the names of students who took only four-credit courses.17. Find the names of students who took every four-credit course.18. Find the names of all students for whom no other student lives in the same

city.19. Find names of students who took every course taken by Richard Pierce.20. Find the names of instructors who teach no courses.21. Find course numbers of courses that have never been taught.22. Find courses that are taught by every assistant professor.23. Find the names of students whose advisor did not teach them any course.

5.25 Exercises 161

24. Find the names of students who have failed all their courses (failing isdefined as a grade less than 60).

25. Find the names of students who do not have an advisor.26. Find the names of instructors who taught every semester when a student

from Rhode Island was enrolled.27. Find course names of courses taken by every student advised by Prof.

Evans.28. Find names of students who took every course taught by an instructor

who is advising at least two students.29. Find names of instructors who teach every student they advise.30. Find names of students who are taking every course taught by their advi-

sor.31. Find course numbers of courses taken by every student who lives in Rhode

Island.32. Find the student numbers of students who took at least two courses.33. Find the course names of courses in which at least three students were

enrolled.34. Find the names of instructors who advise at least two students.35. List all students by name, along with their grade averages.

36. Find student numbers of students for whom the difference between thehighest and the lowest grade is less than 20.

37. Print a report that contains for each course (cno), the number of studentswho took the course, the highest, the lowest, and the average grade in thecourse.

38. Find the average grade of students who took cs110 at any time. Then,find students whose grades in cs110 were above the average.

39. Identify those queries that require division among the queries 3 to 34 andsolve those queries using the group by option of SQL.

40. Create views on the college database as specified:(a) A view that contains the names of the instructors, the courses (cnos)

that they teach, and the average grade in these courses.(b) A view that shows the names and offices of the instructors.(c) A view that contains the courses (cnos) , the number of students who

took the courses, the average grade in these courses, and the highestgrade.

(d) A view that contains the names of instructors and the names of thestudents that they advise.

(e) A view that shows the data about the students in Massachusetts.

41. Print the contents of the views created in Exercise 40.42. Determine which of the views created in Exercise 40 can be updated.43. Using the views created in Exercise 40(a) and 40(c) create a view that

lists the instructors and the total number of students they teach.44. Solve the following queries:

(a) list names of instructors and the number of courses they taught;(b) list instructors in the order of the number of courses they taught;


(c) list the top three instructors in the order of the number of coursesthey taught.

45. Let GRAPH be the table introduced in Example 5.20.3. The degree of avertex is the number of edges incident to that vertex.(a) write an SQL query that yields a list of vertices of a graph arranged

in the decreasing order of their degrees;(b) list the top 5 vertices of a graph in increasing order of their degrees.

46. For each instructors list the sequence of the numbers of courses that theinstructor taught during each of the semesters that he or she was active.

47. List the top three instructors in the order of the number of students thatthey advise.

5.26 Bibliographical Comments

The initial standard known as SQL1 is recorded in citeX3,ISO7. SQL2 wasdefined in [International Organization for Standardization, 1992]. Extensivepresentations of SQL3 can be found in [Melton and Simon, 1993; Melton andSimon, 2002] and [Fortier, 1999]. Also, useful reference are [Line and Kline,2000] and [J. Kauffman, 2001].

SQL — The Relational Language › UB › SR › databases › Simovici_SQL.pdf · 66 SQL — The...

Documents

Transcript of SQL — The Relational Language › UB › SR › databases › Simovici_SQL.pdf · 66 SQL — The...