sql

Chapter 5

SQL The Relational

Language

5 SQL The Relational Language 63

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.2 Tabular Variables in SQL . . . . . . . . . . . . . . . . . . . . . . 65

5.2.1 Creation of Tables . . . . . . . . . . . . . . . . . . . . . . 665.3 Referential Integrity in SQL . . . . . . . . . . . . . . . . . . . . . 705.4 Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.4.1 String Domains . . . . . . . . . . . . . . . . . . . . . . . . 725.4.2 Numeric Domains . . . . . . . . . . . . . . . . . . . . . . 725.4.3 Special Domains . . . . . . . . . . . . . . . . . . . . . . . 735.4.4 Basic Domains Supported by ORACLE . . . . . . . . . . 73

5.5 SELECT Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.6 The WHERE Option . . . . . . . . . . . . . . . . . . . . . . . . . 775.7 Union, Intersection, and Difference in SQL . . . . . . . . . . . . . 825.8 Table Product in SQL . . . . . . . . . . . . . . . . . . . . . . . . 845.9 Join in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.10 Sets and subqueries . . . . . . . . . . . . . . . . . . . . . . . . . . 885.11 Parametrized subqueries . . . . . . . . . . . . . . . . . . . . . . . 915.12 Subqueries and division . . . . . . . . . . . . . . . . . . . . . . . 935.13 Relational Completeness of SQL . . . . . . . . . . . . . . . . . . 955.14 Scalar Functions of SQL . . . . . . . . . . . . . . . . . . . . . . . 96

5.14.1 Numerical Functions . . . . . . . . . . . . . . . . . . . . . 965.14.2 String Functions . . . . . . . . . . . . . . . . . . . . . . . 975.14.3 Date functions . . . . . . . . . . . . . . . . . . . . . . . . 100

5.15 Aggregate Functions in SQL . . . . . . . . . . . . . . . . . . . . . 1025.16 Sorting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.17 The Group-by Option . . . . . . . . . . . . . . . . . . . . . . . . 107

5.17.1 The decode and case Functions . . . . . . . . . . . . . . . 111

66 SQL The Relational Language

5.17.2 The rollup and cube Extensions of group by . . . . . . . . 114

5.18 Analytical Capabilities of SQL Plus . . . . . . . . . . . . . . . . . 124

5.18.1 Ranking Functions . . . . . . . . . . . . . . . . . . . . . . 125

5.18.2 Top-n Queries . . . . . . . . . . . . . . . . . . . . . . . . . 129

5.18.3 Windowing functions in SQL Plus . . . . . . . . . . . . . . 131

5.19 Statistics in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.19.1 Variance and Correlation . . . . . . . . . . . . . . . . . . 132

5.19.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 136

5.20 Graphs and SQL in SQL Plus . . . . . . . . . . . . . . . . . . . . 138

5.21 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

5.22 Access Rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

5.23 Views in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

5.24 Accessing metadata in SQLPlus . . . . . . . . . . . . . . . . . . . 151

5.25 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

5.26 Bibliographical Comments . . . . . . . . . . . . . . . . . . . . . . 155

5.1 Introduction

SQL is an acronym for Structured Query Language and is the name of the mostimportant tool for defining and manipulating relational databases. The develop-ment of SQL began in the mid-1970s at the IBM San Jose Research Laboratory.The success of an experimental IBM database system (known as System R) thatincorporated SQL compelled a number of software manufacturers to join IBMin developing relational database systems that incorporated SQL. In 1982, theAmerican National Standards Institute (ANSI) initiated the development of astandard for a query language for relational database systems, it opted for SQLas its prototype. The resulting ANSI standard, issued in 1986, was adopted asan International Standard by the International Organization for Standardization(ISO) in 1987.

In the late 1980s, embedded SQL was standardized by ANSI, and work onexpanding SQL continues. A much extended version of the original standard,known as SQL92, was adopted by ISO/IEC at the end of 1992. To reflect cur-rent trends in the database field towards object-relational technology, a newstandard ISO/IEC 9075-1, known as SQL99, was published in July 1999. Aswe shall see, SQL99 is a superset of SQL92. New features incorporated by thisstandard include object-relational extensions (user-defined data types, referencetypes, collections, large object support, table hierarchies), active database fea-tures (triggers), stored procedures and functions, on-line analytic processingextensions, etc. More recently, in 2003, a new standard was issued. This newedition of the standard includes a new chapter that deals with the interactionbetween SQL and XML (which we discuss in Chapter 10), correction to SQL99,and several new features.

Our presentation concentrates initially on common SQL features, applicableto a wide range of SQL implementations.

5.2 Tabular Variables in SQL 67

SQL is a nonprocedural language. This means that a query formulated inSQL need not specify how a problem is to be solved nor how data should beaccessed by the computing system; instead, an SQL query states what the queryis, i.e., what data are sought.

This leaves the user free to focus on the logic of the query. Because theDBMS makes use of its internal knowledge, in most cases, the DBMS generatesretrieval procedures that are faster than equivalent retrieval procedures builtdirectly by the user.

The SQL language consists of three components: the data definition lan-guage (DDL), the data manipulation language (DML), and the data controllanguage (DCL). The first component allows the user to define the structure ofthe tables of the database. The second contains retrieval and update directives.The last component allows the database administrator to define the access rightsto the database for various categories of users.

SQL syntax is format-free: tabs, carriage returns, and spaces can be includedanywhere a space occurs in the definition of an SQL construct. Also, case isinsignificant in table names, reserved words and keywords. However, case issignificant in character string literals.

5.2 Tabular Variables in SQL

When we introduced tables in Chapter 3, we assumed that the contents of a tableis a relation, that is, it is a set of tuples. To conform to the reality of databaseswe need to define the content of a table as a sequence of tuples. Thus, a tablemay contain several copies of the same tuple. If a table is allowed to containduplicates, then even if we know all components of a tuple, we may be unableto identify the corresponding row in the table uniquely. As a consequence, notevery table has a key.

In this section we present a topic that we refer to informally as table cre-ation. In reality, we create an object similar to a variable in a programminglanguage that we call a tabular variable. The values of a tabular variables aretables and these values change in time. Tabular variables are created using theconstruction create table.

Example 5.2.1 To create a tabular variable called PATRONS having the head-ing

name addr city zip telno date of birthwe write:

create table PATRONS (name varchar(35) not null,

addr varchar(50),

city varchar(25),

zip char(9),

telno char(12),

date_of_birth date);

As we shall see, each attribute is followed by a description of its domain. The


effect of this command is to create a tabular variable whose initial value is atable whose contents is the empty set of tuples:

PATRONS

name addr city zip telno date of birth

After inserting a first row, the next value of the tabular variable PATRONSis the table:

PATRONS


Ann Richards 56 Green Ln Natick 02170 508-561-0987 02/15/78

A second insertion yields a new table as the value for the tabular variable:PATRONS


Ann Richards 56 Green Ln Natick 01170 508-561-0987 02/15/78Ron Scott 50 Cider Hill Framingham 01160 608-663-0211 11/4/80

If the first patron moves to a new address, the first row is modified and thetabular variable assumes a third value:

PATRONS


Ann Richards 77 Lake St. Milton 02186 617-364-0606 02/15/78Ron Scott 50 Cider Hill Framingham 02160 608-663-0211 11/4/80

The values that the tabular variable PATRONS may assume are the actualtables that have the name and the heading specified at the creation of thetabular variable. In addition, we can specify several types of constraints thatany value of the tabular variable must satisfy.

Before it is possible to create tabular variables and form queries, it is neces-sary to create an empty database in which to work. In practice, this is generallydone at the level of the operating system, usually with a command that is pro-vided by the vendor of the DBMS.

To start, we assume that we have created an empty database. In this sectionwe begin to discuss a part of the data definition component of SQL, namely, thecreation of tabular variables, or informally, the creation of database tables.

5.2.1 Table Creation

We refer to the components of the Data Definition Language (DDL) as directives.The SQL directive for adding tables to a database is create table.

At a minimum, as we saw in Example 5.2.1, creating a tabular variablein SQL requires that we specify its name and its attributes along with theirdomains. The syntax for this is:

create table table name

[(attr def {,attr def })],

where the attribute definition attr def has the syntax:

attribute name domain


A slightly more general form (that ignores certain details related to thephysical design of databases), the directive that creates a tabular variable iscreate table and has the form: following syntax:

create table [schema.]table name[(attr def | table constraint | table ref clause {,attr def | table constraint | table ref clause })],

where the attribute definition attr def has the syntax:

attribute name domain [default expr] [column ref clause]{column constraint}

As a result of the execution of this directive, an initial amount of spaceis reserved in secondary memory to accommodate future values of the tabularvariable, and the metadata are modified to reflect the addition of the new tabularvariable. Specialized SQL constructions, discussed later (insert, delete, andupdate) can be used to modify the value of this variable.

Creation of tabular variables permits placing restrictions, called constraintson the contents of any value that the tabular variable may assume. The con-straints that follow have a global character (which means that they apply tothe contents of a table in its entirety) and apply to any value that the tabularvariable may assume.

Definition 5.2.2 A primary key constraint has the form

[constraint constraint name] primary key(list of attributes)

when the primary key consists of the attributes of the list.

Alternate keys of tables can be specified using unique constraints. The syntaxof this type of constraints is:

[constraint constraint name] unique(list of attributes)

This indicates that no two rows of a table that is a value of the tabular variablemay have the same values for the attributes specified in the list.

A constraint of the form cC that involves conditions C that are a Booleancombination of conditions involving only components of tuples and constants isdenoted by:

[constraint constraint name] check(C)

When a constraint involves more than one attribute it is considered a tableconstraint ; otherwise, it is a column constraint. Referential integrity can beimposed by using the column constraint references in the definition of anattribute. To prevent certain components of tuples from assuming a null valuewe can impose the column constraint not null.

Example 5.2.3 To create the tabular variable INSTRUCTORS of the collegedatabase we use the following create table directive:

create table INSTRUCTORS(empno varchar(11) not null,

name varchar(35),

rank varchar(25),

roomno integer,

telno varchar(4), primary key(empno));


The domain of empno is defined to be the set of strings of length at most11. In addition, we have the column constraint not null, which means thatnull cannot be used as a value of the attribute empno. The domains of theother attributes have similar, obvious definitions that are discussed below. Notethat in the definition of INSTRUCTORS we impose a table constraint, namelyprimary key(empno).

Similarly, the tabular variables STUDENTS and COURSES are created by:

create table STUDENTS(stno varchar2(10) not null,

name varchar2(35) not null,

addr varchar2(35),

city varchar2(20),

state varchar2(2),

zip varchar2(10), primary key(stno));

create table COURSES(cno varchar2(5) not null,

cname varchar2(30),

cr smallint, primary key(cno));

A script that creates all tabular variables of the college database is containedin Appendix A.

Example 5.2.4 To express that the primary key of the table GRADES consistsof the attributes stno cno sem year we can say that this table satisfies the primarykey constraint:

constraint pkg primary key (stno, cno, sem, year)

Example 5.2.5 For the table EMPHIST, introduced in Example 3.3.5 we couldintroduce the tuple conditions:

constraint pos_sal check(salary > 0)

and

constraint suf_sal check(position != Programmer or salary > 65000),

respectively. They express that the salary must be a positive number and thatsomebody who is a programmer must be paid more than 65000 dollars, respec-tively.

Thus, the creation of the table EMPHIST can be achieved by:

create table EMPHIST(empno integer not null references PERSINFO(empno),

position varchar2(30),

dept varchar2(20),

appt_date date,

term_date date,

salary float,

check(position != Programmer or salary > 65000),

constraint pos_sal check(salary > 0));

A script that creates the tables PERSINFO, EMPHIST, and REPORTING is con-tained in Appendix C.


Example 5.2.6 In the directives enclosed below we state that stno is both aforeign key for ADVISING and, also, its primary key. In addition, empno is aforeign key for this table (being the primary key for the table INSTRUCTORS).

create table ADVISING(stno varchar2(10) not null

references STUDENTS(stno),

empno varchar2(11)

references INSTRUCTORS(empno),

primary key(stno));

create table GRADES(stno varchar2(10)

not null references STUDENTS(stno),

empno varchar2(11)

not null references INSTRUCTORS(empno),

cno varchar2(5)

not null references COURSES(cno),

sem varchar2(6) not null,

year smallint not null,

grade integer,

primary key(stno,cno,sem,year),

check (grade


If you wish to examine the headings of the tables you created you can issue,for example, the SQL Plus directive

describe INSTRUCTOR;

Then, SQL will print:

Name Null? Type

-------------------------- -------- ------------

EMPNO NOT NULL VARCHAR2(11)

NAME VARCHAR2(35)

RANK VARCHAR2(25)

ROOMNO NUMBER(38)

TELNO VARCHAR2(4)

The directive alter table is used for modifying the structure of an existingtable. Columns may be added or dropped, the names of the columns or theirdata types can be modified, etc. A simplified syntax of this directive is:

alter table table name modification specification

In turn, the modification specification depends on the particular change we needto impose on the table. Examples of such modification specifications include

add column name column type,drop column name,modify column name column type,rename column name to new column name,

as well as many other choices.

Example 5.2.7 To add a new year column to the table ADVISING we use thedirective:

alter table advising add year varchar2(4);

The entries of the new column year will have initially null values.Column types can be modified using the modify option. For instance, to

increase the maximum length of the values of stno to 12 characters we write:

alter table advising modify stno varchar(12);

Column renaming is executed using the option rename column. Below werename the column stno to studentno:

alter table advising rename column stno to studentno;

Finally, to drop the column year that we just added we write:

alter table advising drop column year;

5.3 Referential Integrity in SQL

We saw that referential integrity can be imposed in SQL using the columnconstraint references. An alternative method is to impose the table constraintforeign key. Its syntax is:

5.3 Referential Integrity in SQL 73

foreign key(attr def {,attr def })references table name ((attr def {,attr def })[on cascade delete]

The foreign key construction contains the option on cascade delete. Therole of this option is to define the behavior of the tables when deletions occurin the table where the primary key occurs. Namely, when a row is removedfrom the table containing the primary key and the clause on cascade delete isspecified, then all rows from the table that contains the corresponding foreignkey that match the removed row are also removed.

Example 5.3.1 Suppose that the tabular variable CITIES is created by:

create table CITIES (city varchar(40),

state char(2),

primary key (city,state));

A second tabular variable, STORES, records the stores that a retailer has inthe covered territory, and is created by

create table STORES (storeno integer not null,

address varchar(40) not null,

city varchar(40),

state char(2),

tel char(12),

primary key storeno,

foreign key(city,state) references CITIES(city,state)

on delete cascade);

To populate the tables we execute the following directives:

insert into CITIES(city, state) values(Boston,MA);

insert into CITIES(city, state) values(Spingfield,MA);

insert into CITIES(city, state) values(Providence,RI);

insert into CITIES(city, state) values(Hartford,CT);

insert into CITIES(city, state) values(Bayonne,NJ);

insert into STORES(storeno, addr, city, state, tel)

values(1,125 Harvard St.,Boston,MA,617-287-0991);


values(2,50 Storrow Drive,Boston,MA,617-566-7629);


values(3,85 Manton Av.,Providence,RI,401-453-1234);


values(4,40 West Street,Hartford,CT,860-232-4484);


values(5,5 Finley Av.,Bayonne,NJ,908-221-0094);



values(6,10 Linton Plaza,Hartford,CT,860-660-2220);


values(7,30 Stilson Rd.,Providence,RI,401-861-5249);

The values of the tabular variables CITIES and STORES are

CITY ST

---------------

Boston MA

Spingfield MA

Providence RI

Hartford CT

Bayonne NJ

and

STORENO ADDR CITY ST TEL

------------------------------------------------------

1 125 Harvard St. Boston MA 617-287-0991

2 50 Storrow Drive Boston MA 617-566-7629

3 85 Manton Av. Providence RI 401-453-1234

4 40 West Street Hartford CT 860-232-4484

5 5 Finley Av. Bayonne NJ 908-221-0094

6 10 Linton Plaza Hartford CT 860-660-2220

7 30 Stilson Rd. Providence RI 401-861-5249

Since the referential integrity was imposed between the tabular variablesCITIES and STORES we need to insert the tuples of CITIES before we can insertthe tuples of STORES. Otherwise, the cities mentioned in the values of STOREScan not reference a city in a value of CITIES and the insertion in STORES willbe rejected.

The presence of on delete cascade means that if a row is removed froma table CITIES that the rows corresponding to that city are also removed. Forexample, if the company closes its business in Hartford and we execute

delete from CITIES where

city = Hartford and state = CT;

then the rows of STORES corresponding to the stores in Hartford, CT will bedeleted automatically.

Removal of the tabular variables is also constrained by the referential in-tegrity. It would be impossible to remove the tabular city CITIES before weremove the table STORES because STORES references CITIES. Thus, the cor-rect order of removal is

drop table STORES;

drop table CITIES;

If the clause on cascade delete is absent, then the deletion of a row fromCITIES is impossible unless we delete first the rows of STORES that correspondto the city that is removed from CITIES.

5.4 Basic Data Types 75

5.4 Basic Data Types

SQL makes use of a collection of domains that, in general, varies from oneimplementation to another. Not all domains of the standard exist in everyimplementation, and not all domains of implementations exist in the standard.

Basic domains supported by virtually all implementations of SQL can beclassified as string domains, numerical domains, and special domains.

5.4.1 String Domains

String domains represent fixed-length or variable-length sets of sequences ofcharacters. In this category, we have char(n), which represents the set of stringsof characters (from a given basic set of characters) that have fixed length n. Sim-ilarly, varchar(n) represents the set of variable-length strings whose maximallength is n for n > 0.

5.4.2 Numeric Domains

The SQL standard prescribes two kinds of numeric domains: exact numeric datatypes : numeric, decimal, integer and smallint, and approximate numericdata types : float, double precision, and real. Their respective syntax is:

numeric [(p[, s])]decimal [(p[, s])]integer

smallint

float [(p)]double precision

real

Here, p stands for precision and s stands for scale (both of which are non-negative integers). The precision parameter refers to the total number of digits,while the scale indicates the number of digits to the right of the decimal point.The difference between numeric and decimal is that in the latter case, p isunderstood to be the maximum number of digits, while in the former case, p isthe exact total number of digits.

The domains smallint and integer have a number of digits dependent onthe implementation; however, the precision of integer is required to be equalto or larger than the precision of smallint.

The float domain includes approximate representations of real numbers hav-ing precision at least p. Also, real and double precision have implementation-dependent precision, where the precision of double precision is never smallerthan the one of real.

5.4.3 Special Domains

Specific DBMSs have their own domains. For instance, ORACLE has the longdomain that contains strings of characters of variable length that may be as


large as 65,535 characters.To allow us to begin working with actual examples as quickly as possible, we

introduce some basic domains for ORACLE. Other databases are quite similar,and the reader can obtain the relevant details by consulting product-specificmanuals.

5.4.4 Basic Domains Supported by ORACLE

We review briefly a few of the more important domains supported by ORACLE: In ORACLE, char[(n)] represents variable strings of characters of length

n, where 1 n 32767; the default value of n is 1. The domain charac-ter is the same as char. The characters and their order are determinedby the system during the installation of the DBMS.The domain varchar(n) requires n to be specified and also representsvariable-length strings of characters. It is the intention of ORACLE toseparate char(n) from varchar(n) in future releases: char(n) will repre-sent fixed-length strings while varchar(n) will represent variable-lengthstrings.The varchar2 data type stores variable-length character strings and iscurrently synonymous with the varchar data type. However, in a futureversion of Oracle, varchar might store variable-length character stringscompared with different comparison semantics. Currently there are twotypes of comparison semantics for strings in Oracle: blank-padded com-parison semantics and non-padded comparison semantics.When blank-padded comparison semantics is used, if the two values havedifferent lengths, Oracle first adds blanks to the end of the shorter oneso their lengths are equal. Oracle then compares the values characterby character up to the first character that differs. The value with thegreater character in the first differing position is considered greater. Iftwo values have no differing characters, then they are considered equal.This rule means that two values are equal if they differ only in the numberof trailing blanks. Oracle uses blank-padded comparison semantics onlywhen both values in the comparison are either expressions of data typechar, text literals, or values returned by the user-defined function.In the case of non-padded comparison semantics two values are comparedcharacter by character up to the first character that differs. The value withthe greater character in that position is considered greater. If two valuesof different length are identical up to the end of the shorter one, the longervalue is considered greater. If two values of equal length have no differingcharacters, then the values are considered equal. Oracle uses non-paddedcomparison semantics when one or both values in the comparison have thedata type varchar or varchar2.In either of the two comparison semantics we have ab > aa andab > a . However, in the blank-padded comparison semantics wehave a = a, while in the non-padded semantics we have a > a.

The domain date represents dates in the format dd-mmm-yy.

5.5 SELECT Phrases 77

The domain long (also denoted by long varchar) represents variable-length strings of characters with no more than 65,535 characters. At mostone attribute may have this domain in any table.

The number domain in ORACLE can be used in several forms as specifiedby the following syntax:

number [(p[, s])],

where p is the precision and s is the scale.The maximum precision of number is 38. The scale can vary between

84 and 127. If the scale is negative, the number is rounded to thespecified number of places to the left of the decimal point.The following cases may occur when we insert a value in a column whose

domain is number:

Data Domain Stored as

1,234,567.89 number 1234567.891,234,567.89 number(9) 12345671,234,567.89 number(9,2) 1234567.891,234,567.89 number(9,1) 1234567.91,234,567.8 number(6) error: exceeds precision1,234,567.89 number(10,1) 1234567.91,234,567.89 number(7,-2) 12345001,234,567.89 number(7,2) error: exceeds precision

If s > p, then s specifies the maximum number of valid digits after thedecimal point. For instance, number(4,5) requires at least one digit afterthe decimal point and rounds the digits after the fifth decimal digit. Thenumber 0.012358 is stored as 0.01236.Numbers may also be entered in exponential form, that is, including

an exponent preceded by E. For example, 1234567 can be represented as1.234567E+6, that is, as 1.234567 106.

Floating point domains are supported as float, float(*), and float(b),where b is the binary precision, that is, the number of significant binarydigits. The domains float and float(*) are equivalent, and they consistsof floating point numbers that can be represented by 126 binary digits (or,equivalently, by about 36 decimal digits).

To provide compatibility with other systems, ORACLE supports suchdomains as decimal, integer, smallint, real, and double precision.However, their internal representation is defined by the format of thenumber domain.

5.5 SELECT Phrases

Queries must be written based on the names and headings of the tabular vari-ables and not on the tables that represent their values at any given moment.This is similar to writing programs. A program should work for all legal inputsand not just the ones on which it was tested. In both cases, it is important to


focus on the abstract structure and not on specific examples. The way we writeSQL constructs must be directed only by the logic of the query and not by thecontent of a particular database instance. Just because the query generated theright answer for a particular instance of the database does not mean that it iscorrect.

The main retrieval construction is the select phrase. Consider a query thatwe solved previously using relational algebra. Recall that in Example 4.1.25 wefound the names of all instructors who have taught any student who lives inBrookline. The solution involved using product, selection, and projection:

T1 := (STUDENTS GRADES INSTRUCTORS)

T2 := T1where STUDENTS.stno = GRADES.stnoand

GRADES.empno = INSTRUCTORS.empnoand

STUDENTS.city = Brookline

ANS := T2[INSTRUCTORS.name].In SQL the same problem can be resolved using a single select phrase as in:

select INSTRUCTORS.name from STUDENTS, GRADES, INSTRUCTORS

where STUDENTS.stno = GRADES.stno and

GRADES.empno = INSTRUCTORS.empno and

STUDENTS.city = Brookline;

We can conceptualize the execution of this typical select using the opera-tions of relational algebra as follows:

1. The execution begins by performing the product of the tables listed afterthe reserved word from. In our case, this involves computing the product

STUDENTS GRADES INSTRUCTORS

2. The selection specified after the reserved word where is executed next,if the where part is present (we shall see that this may or may not bepresent in a select.) In our case, this amounts to retaining that part ofthe table product that satisfies the condition:

STUDENTS.stno = GRADES.stno and GRADES.empno = INSTRUCTORS.empno

and STUDENTS.city = Brookline

3. Finally, the result of the second phase is projected on the attributeslisted between select and from, that is, in our case, on the attributeINSTRUCTORS.name.

We use a string constant (also known as a literal) in the above select, namelyBrookline. String constant must begin and end with a single quote.

SQL is not case-sensitive. This means that you may or may not use capitalletters in any place in an SQL construction (except for string comparisons)without any effect on the value returned by the query.

As we mentioned above, the where part of a select (also known as the whereclause) is optional. This allows us to compute table projections in SQL as weshow next.

5.5 SELECT Phrases 79

Example 5.5.1 In Example 4.1.16, we obtain a list of instructors names andthe room numbers of their offices by projecting the table INSTRUCTORS onname roomno.

In SQL this can be done by writing

select name, roomno from INSTRUCTORS;

The select construct used above requires the table name for the table in-volved in the retrieval and the list of attributes that we need to extract.

In general, if we need to compute the projection of a table T on a set ofattributes A1 . . . An of the heading of T , we use the construct:

select A1, . . . , An from T ;

Example 5.5.2 To find out the states where the students originate we projectthe table STUDENTS on the attribute state. This is done by

select state from STUDENTS;

The system returns the result:

ST

--

MA

MA

MA

MA

NH

MA

MA

MA

RI

The value MA is repeated 7 times because there are seven students who livein Massachusetts.

Duplicate values can be eliminated from a query by using the option distinctas in

select distinct state from STUDENTS;

This will yield the answer:

ST

--

MA

NH

RI

where duplicate values have been dropped.


5.6 The WHERE Option

The where clause allows us to extract tuples that satisfy certain conditions; inother words, using the where clause we can perform selections.

Example 5.6.1 To find students who live in Boston we write:

select stno, name, addr, city, state, zip

from STUDENTS

where city = Boston;

This select will return the result:

STNO NAME ADDR CITY ST ZIP

---------------------------------------------------------------

2890 McLane Sandy 30 Cass Rd. Boston MA 02122

4022 Prior Lorraine 8 Beacon St. Boston MA 02125

5544 Rawlings Jerry 15 Pleasant Dr. Boston MA 02115

If we want to extract all columns of a table instance, we can use the wild-card character, *, instead of listing all columns. Thus, we can write the equiv-alent select:

select * from STUDENTS

where city = Boston;

Here the symbol * replaces the full attribute list.

Starting from simple conditions (which we called atomic conditions in Chap-ter 4) we can write queries involving more complicated conditions built by usingand, or, and not.Example 5.6.2 In Example 4.1.14 we retrieved the students who live in Bostonor Brookline. In SQL this can be done by:

select * from STUDENTS

where city = Boston or city = Brookline;

This yields the result:

STNO NAME ADDR CITY ST ZIP

---------------------------------------------------------------

2661 Mixon Leatha 100 School St. Brookline MA 02146


3566 Pierce Richard 70 Park St. Brookline MA 02146



Example 5.6.3 To retrieve the grade records obtained in cs110 during theSpring of 2000 we can write in SQL:

select * from GRADES

where cno = cs110 and sem = SPRING

and year = 2003;

This returns the result:

5.6 The WHERE Option 81

STNO EMPNO CNO SEM YEAR GRADE

---------- ----------- ----- ------ ---------- ----------

1011 023 cs110 SPRING 2000 75

4022 023 cs110 SPRING 2000 60

Selections can be combined with projections in a single SQL phrase.Example 5.6.4 In the select phrase:

select stno, empno from GRADES

where cno = cs110;

the projection specified by where cno = cs110 is followed by the projectionon the attributes stno, empno that are listed after the word select. The resultis:

STNO EMPNO

---------- -----

1011 019

2661 019

3566 019

5544 019

1011 023

4022 023

In SQL we can use conditions that implement limited pattern matching.Certain patterns can be specified using the symbol % to replace 0 or more char-acters, and the underscore to replace exactly one character. As mentioned ear-lier, SQL is generally not case-sensitive; however, comparisons involving stringsare case-sensitive. Thus, Jerry and JERRY are distinct strings, and Jerry JERRY. The comparison is realized using the operator like.

Example 5.6.5 If we need to find the names and the addresses of studentswhose name includes Jerry, we can use the following select construct:

select name, addr from STUDENTS

where name like %Jerry%;

This returns the table:

NAME ADDR

--------------- ---------------

Rawlings Jerry 15 Pleasant Dr.

Lewis Jerry 1 Main Rd.

Example 5.6.6 Suppose the computer science course numbers were carefullyassigned so that all fundamental programming courses have a 1 as their seconddigit. Then the following select construct lists all fundamental programmingcourses.


select * from COURSES

where cno like cs_1%;

The corresponding result is:

CNO CNAME CR CAP

----- ------------------------- -- ---

cs110 Introduction to Computing 4 120

cs210 Computer Programming 4 100

cs310 Data Structures 3 60

cs410 Software Engineering 3 40

Using the reserved word between, we can ensure that certain values arelimited to prescribed intervals (including the endpoints of these intervals).

Example 5.6.7 To find the students who obtained some grade between 65 and85 in 2002, we apply the following query:

select distinct stno from GRADES

where year = 2003 and

grade between 65 and 85;

This select construct returns the table:

STNO

----

1011

2661

5571

The previous select is simply a shorthand for

select distinct stno from GRADES

where year = 2003 and

grade >= 65 and

grade

5.6 The WHERE Option 83

3566

4022

5571

We can test if certain components of tuples belong to a certain list of valuesby using a condition of the form:

A in (v1, . . . , vn)

This condition is satisfied by those tuples t such that t[A] has one of the valuesv1, . . . , vn.

Example 5.6.9 Let us find the names of students who live in Boston or Brook-line, a query that we already discussed in Example 5.6.2. Using the previouscondition we write:

select name from STUDENTS

where city in (Boston,Brookline);

Then, the desired list is:

NAME

--------------

Mixon Leatha

McLane Sandy

Pierce Richard

Prior Lorraine

Rawlings Jerry

On the other hand, we can test of the negation of a condition using not. Tolist the names of students who live outside those two cities, we write:


where not(city in (Boston,Brookline));

which has the same effect as:


where city not in (Boston,Brookline);

We can insert strings of characters in the list of fields of a select phrase toimprove the presentation of the results.Example 5.6.10 To insert the string Student name: in front of a studentname we write:

select Student name: , name from STUDENTS;

This yields the result:

STUDENTNAME: NAME

--------------- -----------------

Student name: Edwards P. David

Student name: Grogan A. Mary

Student name: Mixon Leatha


Student name: McLane Sandy

Student name: Novak Roland

Student name: Pierce Richard

Student name: Prior Lorraine

Student name: Rawlings Jerry

Student name: Lewis Jerry

In SQL Plus concatenation of strings can be achieved with the concatenationoperator ||.Example 5.6.11 In the next select phrase we concatenate the string Student with a students name, then with the string lives in and the students state:

select Student || name || lives in || state

from STUDENTS;

returns the result:

STUDENT||NAME||LIVESIN||STATE

-------------------------------------

Student Edwards P. David lives in MA

Student Grogan A. Mary lives in MA

Student Mixon Leatha lives in MA

Student McLane Sandy lives in MA

Student Novak Roland lives in NH

Student Pierce Richard lives in MA

Student Prior Lorraine lives in MA

Student Rawlings Jerry lives in MA

Student Lewis Jerry lives in RI

In Microsoft SQL server concatenation is obtained using the + operator.Example 5.6.12 The query shown in Example 5.6.11 can be executed in Mi-crosoft SQL server by

select Student + name + lives in + state

from STUDENTS;

5.7 Union, Intersection, and Difference in SQL

Recall that union, intersection, and difference as defined in relational algebramay occur only between tables that have identical headings. To execute theseoperations in SQL, we need to use compound select phrases. Compound selectsare constructed from simple select phrases using the reserved words union,intersect, and minus. As we shall see, SQL treats union, intersection and dif-ference as operations between sets of tuples, and therefore, it removes duplicatevalues from the results of the queries.

5.7 Union, Intersection, and Difference in SQL 85

Example 5.7.1 To determine the student numbers of students who took cs210we write:

select stno from GRADES

where cno = cs210;

This returns the result:

STNO

----

1011

2661

3566

5571

4022

Similarly, we find the student numbers of students who took cs240:


where cno = cs240;

In turn, this yields:

STNO

----

3566

5571

2415

5544

1011

4022

To find the students who took both cs210 and cs240 we use the intersectto link the two previous select phrases into a compound select:

select stno from grades where cno = cs210

intersect

select stno from grades where cno = cs240;

This gives:

STNO

----

1011

3566

4022

5571

Neither SQL Server nor MySQL aupport the intersect operation.The union of the two sets is computed by the following compound select:


union


Note that the tuples of the result are sorted.


STNO

----

1011

2415

2661

3566

4022

5544

5571

If we wish to retain all values in the result, then we need to use union allto link the select phrases as in:


union all


The result contain now all values retrieved by the individual selects:

STNO

----

1011

2661

3566

5571

4022

3566

5571

2415

5544

1011

4022

The set difference is computed in ORACLEs SQLPlus using minus. To findthe students who took cs210 but did not take cs240 we write:


minus


which returns the result:

STNO

----

2661

The reverse difference allows us to find students who took cs240 but did nottake cs210:


minus


Now we obtain:

5.8 Table Product in SQL 87

STNO

----

2415

5544

Neither SQL Server nor MySQL support the minus operation.

5.8 Table Product in SQL

A select phrase that lists several distinct table names after the reserved wordfrom computes the product of these tables.Example 5.8.1 To examine all possible pairs of students/instructors we couldwrite the following select:

select STUDENTS.name, INSTRUCTORS.name

from STUDENTS, INSTRUCTORS;

Since our database is in a state that contains 9 students and five instructors,this will result in 45 rows retrieved:

NAME NAME

---------------------------------

Edwards P. David Evans Robert

Grogan A. Mary Evans Robert

Mixon Leatha Evans Robert

.

.

.

Pierce Richard Will Samuel

Prior Lorraine Will Samuel

Rawlings Jerry Will Samuel

Lewis Jerry Will Samuel

Observe that the tables are not linked by any where condition; as expectedin the definition of the product, all combinations of rows are considered. Af-ter computing the product, a projection eliminates all attributes except STU-DENTS.name and INSTRUCTORS.name.

Also, note that we use qualified attributes as required by the definition oftable product (see Definition 4.1.7).

The result produced by the query shown in Example 5.8.1 does not differ-entiate between the attributes STUDENTS.name and INSTRUCTORS.name andthis may confuse the user. Therefore, it is preferable to rename the columns ofthe result using the option as:

select STUDENTS.name as stname, INSTRUCTORS.name as instname


This will generate:


STNAME INSTNAME

---------------------------------



Mixon Leatha Evans Robert

.

.

.

Pierce Richard Will Samuel


Rawlings Jerry Will Samuel


SQL allows for computations of products of several copies of the same tablethrough the creation of aliases; the solution proceeds using the logic discussedin Example 4.1.18. To create an alias S of a table named T we write the nameof the alias after the name of the table in the list of table, making sure that atleast one space (and no comma) exists between the name of the table and itsalias. For example, in the select phrase of Example 5.8.2 we create the alias Iby writing

INSTRUCTORS I

Table aliases are also known as correlation names of tables.

Example 5.8.2 Let us solve the query shown in Example 4.1.18: finding allpairs of instructors names for instructors who share the same office. This canbe done by writing:

select I.name as firstname, INSTRUCTORS.name as secname

from INSTRUCTORS I, INSTRUCTORS

where I.roomno = INSTRUCTORS.roomno and

I.empno < INSTRUCTORS.empno;

The result of this query is:

FIRSTNAME SECNAME

------------------------------

Exxon George Will Samuel

Conceptually, we create an alias I of the table INSTRUCTORS, compute theproduct between this alias and INSTRUCTORS and retain those pairs that sharethe the same room and consist of distinct individuals.

Example 5.8.3 Suppose that we need to find all triples of student names forstudents who live in the same city and state. Now we need to operate with threedistinct copies of the table STUDENTS. This is accomplished by:

select S1.name as name1, S2.name as name2,

S3.name as name3

from STUDENTS S1, STUDENTS S2,

STUDENTS S3

where S1.state = S2.state and

S2.state = S3.state and

5.9 Join in SQL 89

S1.city = S2.city and

S2.city = S3.city and

S1.stno < S2.stno and

S2.stno < S3.stno

which gives the result:

NAME1 NAME2 NAME3

----------------------------------------------------

McLane Sandy Prior Lorraine Rawlings Jerry

5.9 Join in SQL

Earlier version of SQL (at the level of SQL 1) dealt with the join operationindirectly, using operations like product, selection and projection, which arealready available in SQL. The blueprint of this treatment of the join operationwas outlined in Section 4.2.

Example 5.9.1 The SQL solution to the query considered in Example 4.2.2 inwhich we seek to find the names of instructors who have taught any four-creditcourse is solved in SQL by writing:

select distinct INSTRUCTORS.name

from COURSES, GRADES, INSTRUCTORS

where COURSES.cr = 4

and COURSES.cno = GRADES.cno

and GRADES.empno = INSTRUCTORS.empno;

The steps that we applied in relational algebra can be easily reconstituted inSQL. The first step that consists of computing the product

T1 = COURSES GRADES INSTRUCTORS

corresponds to the list of tables that follows the word from. Then, the selectionspecified by

T2 = (T1whereCOURSES.cr = 4 and

COURSES.cno = GRADES.cnoand

GRADES.empno = INSTRUCTORS.empno)

is executed using the condition of the where clause.Finally, the projection

T3(name) = T2[INSTRUCTORS.name]

corresponds to the list that follows select. In this case, this list consists of oneattribute, INSTRUCTORS.name.

We give one more example that shows a typical query that uses a join.


Example 5.9.2 To list all pairs of student names and course names such thatthe student takes the course, the relational algebra solution would require thatwe join the tables STUDENTS, GRADES, and COURSES. In SQL we write:

select distinct STUDENTS.name, COURSES.cname

from STUDENTS, GRADES, COURSES


GRADES.cno = COURSES.cno

This query will return:

NAME CNAME

--------------------------------------------------

Edwards P. David Computer Architecture

Edwards P. David Computer Programming

Edwards P. David Introduction to Computing

Grogan A. Mary Computer Architecture

.

.

.

Prior Lorraine Data Structures

Prior Lorraine Introduction to Computing

Rawlings Jerry Computer Architecture

Rawlings Jerry Introduction to Computing

SQL dialects that conform to the SQL-2 standard (e.g., SQLPlus of Oracle9i and 10g, and Microsoft SQL Server) allow the use of the constructions in-ner join and on. For example, the query discussed in Example 5.9.1 has thealternate solution:


from INSTRUCTORS, COURSES INNER JOIN GRADES

on COURSES.cno = GRADES.cno

where INSTRUCTORS.empno = GRADES.empno

and COURSES.cr = 4;

This query should be viewed as computing the natural join of COURSES andGRADES based on the equality of the attributes they share (as specified by theon clause. Then, the join INSTRUCTORS with the result of the previous join iscomputed using the simulation by product and selection method.

In SQL Plus queries involving natural joins among tables who attributesidentically named can be further simplified by applying the using clause, whichlists the attributes involved in the joining.Example 5.9.3 To retrieve the names of instructors who taught cs110 we canexecute in SQL Plus the query:


from INSTRUCTORS inner join GRADES

using(empno);

The inner join can be used for joins that involve more than two tables.

5.9 Join in SQL 91

Example 5.9.4 An alternative solution to the query of Example 5.9.1 thatmakes use of the inner join operation is:


from

INSTRUCTORS inner join GRADES

using(empno)

inner join COURSES

using(cno)

where COURSES.cr = 4

It is possible to involve several attributes in an inner join either explicitely,using the claues on or implicitely, employing the clause using.

Example 5.9.5 To find the pairs of names of students and instructors such thatthe student takes a course with the instructor who is also his or her advisor, wecan write either:

select distinct STUDENTS.name as sname, INSTRUCTORS.name as iname

from GRADES inner join ADVISING

on GRADES.stno = ADVISING.stno and

GRADES.empno = ADVISING.empno

inner join STUDENTS

on ADVISING.stno = STUDENTS.stno

inner join INSTRUCTORS

on ADVISING.empno = INSTRUCTORS.empno

or, equivalently,

select distinct STUDENTS.name as sname, INSTRUCTORS.name as iname

from GRADES inner join ADVISING

using(stno,empno)

inner join STUDENTS

using(stno)

inner join INSTRUCTORS

using(empno)

Cartesian product of two tables can be computed, alternatively using thecross join operation.Example 5.9.6 The query that we wrote in Example 5.8.1 that generates allpossible pairs of students/instructors can be also written as:


from STUDENTS cross join INSTRUCTORS;

which is equivalent to




We saw that when joining two tables not all tuples are joinable; tuples thatbelong to one table and are not joinable with any tuple of the other table leave notrace in the join, a situation that is often inconvenient. As we saw in Section 4.3,the outer join operation and its variants, the left outer join and the right outerjoin can rectify this situation.

Let us assume that the tabular variables STUDENTS and INSTRUCTORScontain the tuples shown in Figure 5.1.

The tabular variable ADVISING has the same content as the one shown inFigure 3.1.

Example 5.9.7 Oracles own syntax for left outer join is to designate the com-ponent that may be null by (+), as in

select students.name, ADVISING.empno from STUDENTS, ADVISING

where STUDENTS.stno = ADVISING.stno(+)

This is equivalent to using the operator left outer join as specified by SQL2:

select STUDENTS.name, ADVISING.empno

from STUDENTS left outer join ADVISING

on STUDENTS.stno = ADVISING.stno

\end{PGMdiplsy}

Either phrase will return:

\begin{PGMdisplay}

name empno

-----------------------------------------

Edwards P. David 019

Grogan A. Mary 019

Mixon Leatha 023

McLane Sandy 023

Novak Roland 056

Pierce Richard 126

Prior Lorraine 234

Rawlings Jerry 023

Lewis Jerry 234

Davis Richard

Chu Martin

The computation of the right outer join is similar. We can use either Oraclessyntax as in

select ADVISING.stno, INSTRUCTORS.name from ADVISING, INSTRUCTORS

where ADVISING.empno(+) = INSTRUCTORS.empno;

or the standard syntax:

select ADVISING.stno, INSTRUCTORS.name

from ADVISING right outer join INSTRUCTORS

on ADVISING.empno = INSTRUCTORS.empno;

In either case we shall obtain:

5.9 Join in SQL 93

STUDENTS

stno name addr city state zip

1011 Edwards P. David 10 Red Rd. Newton MA 02159

2415 Grogan A. Mary 8 Walnut St. Malden MA 02148

2661 Mixon Leatha 100 School St. Brookline MA 02146


3442 Novak Roland 42 Beacon St. Nashua NH 03060

3566 Pierce Richard 70 Park St. Brookline MA 02146



5571 Lewis Jerry 1 Main Rd Providence RI 02904

6410 Davis Richard 45 Algonquin Rd. Natick MA 01760

7209 Chu Martin 90 Rye Dr. Ayer MA 01290

INSTRUCTORS

empno name rank roomno telno

019 Evans Robert Professor 82 7122

023 Exxon George Professor 90 9101

056 Sawyer Kathy Assoc. Prof. 91 5110

126 Davis William Assoc. Prof. 72 5411

234 Will Samuel Assist.Prof. 90 7024

323 Campbell Kenneth Professor 102 7077

Figure 5.1: Tables with tuples with null components


stno name

---------------------------

1011 Evans Robert

2415 Evans Robert

2661 Exxon George

2890 Exxon George

5544 Exxon George

3442 Sawyer Kathy

3566 Davis William

4022 Will Samuel

5571 Will Samuel

Campbell Kenneth

Finally, the outer join itself can be computed using the operator outer join:


from students full outer join advising

using(stno)

full outer join instructors

using(empno);

This will result in

sname iname

-----------------------------------------------------



Rawlings Jerry Exxon George

McLane Sandy Exxon George

Mixon Leatha Exxon George

Novak Roland Sawyer Kathy

Pierce Richard Davis William



Chu Martin

Davis Richard

Campbell Kenneth

5.10 Sets and subqueries

Subqueries are select phrases that return sets rather than tables. Their mainuse is in conditions that involve sets. As we shall see, they are useful in imple-menting difference and division

in SQL. Syntactically, a subquery is written by placing a select phrasebetween a pair of parentheses. For example,

(select empno from INSTRUCTORS where rank = Professor);

5.10 Sets and subqueries 95

is a subquery that computes the employee numbers of full professors. To findthe student numbers of students who take a course with a full professor, weneed to select those GRADES tuples whose empno belongs to this set. This canbe accomplished by writing:

select distinct stno from GRADES where

empno in (select empno from INSTRUCTORS

where rank = Professor);

This will return the result:

STNO

----

1011

2415

2661

3566

4022

5544

5571

We refer to the first select as the calling select, or the main select or the outerselect; the select of the subquery is the inner select.

As we saw in the introductory example, membership can be tested using in.Here is another example.Example 5.10.1 Let us find the names of students who took cs310. We de-termine the student numbers of those students using a subquery. Then, in themain select, we retrieve those students whose student number is in this set.This can be accomplished using the query:

select name from STUDENTS where

stno in (select stno from GRADES

where cno = cs310);

which returns the table:

NAME

--------------

Mixon Leatha

Prior Lorraine

It is possible to test membership of a tuple in a set of tuples computed by asubquery using a condition of the form

(x1, . . . , xn) in (select A1, . . . , An from )

This type of test is included by SQL99, but it is not implemented in many SQLdialects. However, it is in ORACLE and DB2.

Example 5.10.2 To find the pairs of names of students and instructors suchthat the student took some course with the instructor but no four-credit course.This is computed by the following query:


select STUDENTS.name as sname,

INSTRUCTORS.name as iname

from STUDENTS, INSTRUCTORS where

(STUDENTS.stno, INSTRUCTORS.empno) in

(select stno, empno from grades

minus

select stno, empno from grades

where cno in (select cno

from courses

where cr=4));

This will return the following table:

SNAME INAME

------------------ -------------

Edwards P. David Sawyer Kathy


Mixon Leatha Will Samuel

Novak Roland Will Samuel

Prior Lorraine Sawyer Kathy


Rawlings Jerry Sawyer Kathy


If oper is one of the operators =, !=, , =, then we can useconditions of the form

v oper any (select ...)

or

v oper all (select ...)

in comparisons that involve some elements of the set computed by the subquery(select ) or all elements of the same set, respectively. Here != stands forinequality.

Example 5.10.3 To find the names of the courses taken by the student whosestudent number is 1011, we can use the following query:

select cname from COURSES where

cno = any (select cno from GRADES where stno= 1011);

The construct = any is synonymous with in, and the same query could bewritten as:

select cname from COURSES

where cno in (select cno from GRADES where stno= 1011);

Also, instead of = any we could use = some, and so, we have a third way orwriting the same query:

select cname from COURSES where

cno = some (select cno from GRADES where stno= 1011);

5.11 Parametrized subqueries 97

All three queries result in the table:

CNAME

-------------------------

Introduction to Computing

Computer Programming

Computer Architecture

Example 5.10.4 Let us find the students who obtained the highest grade incs110. Although there are methods that we explain later that yield much simplersolutions for this type of query, for the moment we want to illustrate the oper allcondition. We operate on two copies of GRADES. The copy used in the innerselect is intended for computing the grades obtained in cs110:

select stno from GRADES where cno = cs110

and grade >= all(select grade from GRADES

where cno = cs110);

We obtain the table:

STNO

----

5544

Example 5.10.5 Let us find the students who obtained a grade higher than anygrade given by a certain instructor, say Prof. Will. Using the all... subquerywe can write:


where grade >= all(select grade from GRADES

where empno in (select empno from INSTRUCTORS

where name like Will%));

If we alter this query and replace the instructor with Prof. Davis, who teachesno courses, then the set computed by the query


where grade >= all(select grade from GRADES

where empno in (select empno from INSTRUCTORS

where name like Davis%));

is empty. Therefore, every grade satisfies the inequality, and we obtain allstudent numers for students who took any course!

5.11 Parametrized subqueries

Often the retrieval performed in a subquery depends on a value provided by thecalling select. A typical situation is described in the following example.


Example 5.11.1 Suppose that we need to retrieve the course numbers of coursestaken by the student whose student number is STUDENTS.stno. Ignore (for themoment) the origin of this piece of data. Then, the retrieval is done by theselect construct:

select cno from GRADES

where stno = STUDENTS.stno;

Next, we transform this select into a subquery. The student number STU-DENTS.stno is provided by the outer select of the following construct:

select name from STUDENTS where cs310 in

(select cno from GRADES

where stno = STUDENTS.stno);

Observe that this provides an alternate solution to the query discussed in Ex-ample 5.10.1. Namely, we use a subquery to compute the courses taken by eachstudent. Then, we test if cs310 is one of these courses. We use the qualified at-tribute STUDENTS.stno inside the subquery to differentiate between this inputparameter and the attribute stno of the table GRADES.

Sets of tuples produced by subqueries can be tested for emptiness using theexists condition. Namely, the condition

exists (select from )

is true if the set returned by the subquery is not empty; similarly,

not exists (select from )

is true if the set returned by the subquery is empty.

Example 5.11.2 Let us give yet another solution to the query we solved inExample 5.10.1. This time, to find the names of students who took cs310 wedetermine the student numbers of those students for whom their set of gradesin cs310 is not empty. This can be done as follows:

select name from STUDENTS where

exists (select * from GRADES where

stno = STUDENTS.stno and

cno = cs310);

As a result, we have the table:

NAME

--------------

Mixon Leatha

Prior Lorraine

Example 5.11.3 To find instructors who never taught cs110, we search forinstructors for whom there is no GRADES record involving cs310 and theseinstructors. This can be done by

5.11 Parametrized subqueries 99

select name from INSTRUCTORS where

not exists(select * from GRADES where

empno = INSTRUCTORS.empno and

cno = cs110);

which results in the table:

NAME

-------------

Sawyer Kathy

Davis William

Will Samuel

If both the main query and the subquery deal with the same table and thesubquery requires input parameters from the outer query, then we use an aliasof the table in the outer query.

Example 5.11.4 Let us find the student numbers of students whose advisoris advising at least one other student. The information is contained in theADVISING table, and the following select construct uses both ADVISING (inthe subquery) and its alias A in the main query:

select distinct stno from ADVISING A

where exists (select * from ADVISING where

empno = A.empno and stno != A.stno);

This query returns the table:

STNO

----

1011

2415

2661

2890

4022

5544

5571

Subqueries can be used in the list that follows from in exactly the samemanner that tables are used. This is shown in the next example:Example 5.11.5 To find the pairs of names of students and instructors suchthat the student took some course with the instructor we could write:

select STUDENTS.name as sname, INSTRUCTORS.name as iname

from STUDENTS, INSTRUCTORS,

(select stno, empno from GRADES) PN

where STUDENTS.stno = PN.stno and

INSTRUCTORS.empno = PN.empno;


The difference of the tables T and S can be computed by looking for eachtuple of T for which there is no matching tuple in S. This can be done by:select * from T where

not exists (select * from S whereA1 = T.A1 and and An = T.An)

Example 5.11.6 Courses offered by the continuing education program but notby the regular program can be found by writing:

select * from CED_COURSES where

not exists (select * from COURSES where

cno = CED_COURSES.cno)

which takes advantage of the fact that cno is a key for both COURSES andCED COURSES.

5.12 Subqueries and division

SQL does not have a division operation. However, as we saw in Examples 4.1.27and 4.2.3, we can perform division using product, projection, and difference. Ofcourse, we could apply the prescription offered by relational algebra. This typeof solution is discussed in the next example.Example 5.12.1 The solution envisioned here is

select cno from grades

minus

select GI.cno from (select grades.cno,

instructors.empno

from grades, instructors

where rank=Professor) GI

where (GI.cno,GI.empno) not in (select cno,empno from grades)

Note that the query

select grades.cno, instructors.empno


where rank=Professor

computes all pairs of courses and instructor numbers using the product of thetables GRADES and INSTRUCTORS. Then, the query

select GI.cno from (select grades.cno,

instructors.empno


where rank=Professor) GI

where (GI.cno,GI.empno) not in (select cno, empno from grades)

extracts the courses that are part of the pairs of the previous table that do notappear in the GRADES table, that is, the courses for which there exists a fullprofessor who did not teach these courses. These are the courses that we needto exclude from the answer. Thus, the query presented at the beginning of thisexample yields the solution of the problem:

5.12 Subqueries and division 101

CNO

-----

cs110

The solution presented in Example 5.12.1 is not applicable in SQL dialectsthat do not have all the facilities of SQL Plus. Therefore, we need to examinean alternate way of solving this problem that is almost universally usable. Tounderstand the technique used we examine the solution of the query formulatedin the next example.

Example 5.12.2 Again, suppose that we need to determine the courses taughtby every full professor. Let us formulate the same query in a way that iseasier to translate in SQL. Namely, we find the courses for which there are nofull professors who have not taught these courses. The reader should realizeimmediately that this is simply a new formulation of the same problem. Weshow the solution in steps, moving gradually from plain English to SQL:

Phase I:

select cno from GRADES G where

not exists (instructors who are full professors andhave not taught the course G.cno)

Phase II:


not exists (select * from INSTRUCTORS

where rank = Professor and

these instructors have not taught

the course G.cno)

Phase III:


not exists (select * from INSTRUCTORS

where rank = Professor and

not exists (select * from GRADES

where empno = INSTRUCTORS.empno

and cno = G.cno));

In Phase I we determine in SQL the course numbers for which no full pro-fessor exists who has not taught these courses.

In Phase II we concentrate on preventing the existence of full professors whoare not teaching these courses. Note that Phase II still contains an untranslatedpart.

Finally, in Phase III, we translate the part who have not taught thesecourses using not exists for the second time.

Example 5.12.3 Another query that requires division in relational algebra is:Find names of instructors who have taught every 100-level course, that is,


every course whose first digit of the course number is 1. The formulation thatis better suited to SQL implementation is: Find names of instructors for whomthere is no 100 level course that they have not taught. This is solved by thefollowing select construct:


not exists (select * from COURSES

where cno like cs1__ and

not exists (select * from GRADES where

empno = INSTRUCTORS.empno

and cno = COURSES.cno));

The answer that results from our usual database instance is:

NAME

------------

Evans Robert

Exxon George

5.13 Relational Completeness of SQL

Between Chapter 4 and the current chapter, we have shown that SQL is capableof performing all operations of relational algebra. This fact is known as therelational completeness of SQL. As we shall see in subsequent chapters, thecapabilities of SQL go well beyond the standard definition of relational algebra.

5.14 Scalar Functions of SQL

We present now capabilities of SQL that go beyond relational algebra. We beginby discussing built-in functions in SQL that may act on individual values (scalarfunction), functions that act on sets of values (aggregate functions), and, also,analytic functions that can be used for various statistical computations. Then,we continue with the group by option of select, and we discuss several on-lineanalytic processing functions of SQL.

Scalar functions are built-in functions of SQL that work on individual values.They are highly dependent on the particular implementation of SQL, and welimit our discussions to functions implemented by ORACLEs SQL Plus. Thereare several types of scalar functions, depending on the types of their arguments.

5.14.1 Numerical Functions

Among the numerical functions, abs, sin, cos, power, sqrt, etc. have quite obviousdefinitions. For example, sqrt computes the square root of its argument, whilepower(x, y) computes xy.

5.14 Scalar Functions of SQL 103

Example 5.14.1 To illustrate some of the numerical functions we create atable POINTS whose rows represent labelled points in the plane:

create table POINTS(ptid varchar2(10), x integer, y integer,

primary key(ptid));

and populate this table using the commands:

insert into points(ptid, x, y) values (a,0,0);

insert into points(ptid, x, y) values (b,0,1);

insert into points(ptid, x, y) values (c,0,2);

insert into points(ptid, x, y) values (d,1,0);

insert into points(ptid, x, y) values (e,1,1);

insert into points(ptid, x, y) values (f,1,2);

insert into points(ptid, x, y) values (g,2,0);

insert into points(ptid, x, y) values (h,2,1);

insert into points(ptid, x, y) values (i,2,2);

insert into points(ptid, x, y) values (j,3,0);

insert into points(ptid, x, y) values (k,3,1);

insert into points(ptid, x, y) values (l,3,2);

To determine the distances from a to every other point we write

select p.ptid,

sqrt(power(a.x - p.x,2)+power(a.y - p.y,2))

as dist

from points a, points p

where a.ptid = a

This returns:

PTID DIST

---------- ----------

a 0

b 1

c 2

d 1

e 1.41421356

f 2.23606798

g 2

h 2.23606798

i 2.82842712

j 3

k 3.16227766

l 3.60555128

To compute the distance between a having the coordinates (xa, ya) and a pointp with coordinates (xp, yp), we use the formula d(a, p) =

(xa xp)2 + (ya yp)2.

The formula appears in the target list of the select and is written with the nu-merical functions sqrt and power.

In Oracle we can perform computations unrelated to any table by using afictious tabular variable that is named DUAL.


Example 5.14.2 To compute sin(30), sin(45) and sin(60) in Oracle, wewrite:

select sin(30*3.14159265359/180) as sin30,

sin(45*3.14159265359/180) as sin45,

sin(60*3.14159265359/180) as sin60

from dual;

We need to convert the angles to radians before sin is applied. This will return:

SIN30 SIN45 SIN60

---------- ---------- ----------

.5 .707106781 .866025404

Microsoft SQL server has a simpler way of performing this type of compu-tations in that it does not require the fictitious table.Example 5.14.3 In SQL server we can simply write:

select sin(30*3.14159265359/180) as sin30,

sin(45*3.14159265359/180) as sin45,

sin(60*3.14159265359/180) as sin60;

to obtain the same result as the one obtained in ORACLE.

5.14.2 String Functions

String functions can be used to transform strings, extract parts of strings, trans-form strings, etc.

The functions upper and lower, convert strings to upper and lower charac-ters, respectively.Example 5.14.4 To print names of students in capital characters and coursetitles in small letters we can write:

select distinct upper(STUDENTS.name) as STNAME,

lower(COURSES.cname) as course

from STUDENTS, GRADES, COURSES


GRADES.cno = COURSES.cno;

This generates the following return:

STNAME COURSE

-----------------------------------------------

EDWARDS P. DAVID computer architecture

EDWARDS P. DAVID computer programming

EDWARDS P. DAVID introduction to computing

GROGAN A. MARY computer architecture

.

.

.


PRIOR LORRAINE data structures

PRIOR LORRAINE introduction to computing

RAWLINGS JERRY computer architecture

RAWLINGS JERRY introduction to computing

These functions are particularly useful for performing string comparisons whenignoring case. Thus,

STE\% like upper(stephany)

is true.

Example 5.14.5 The string function replace substitutes every occurrence ofits second argument in the value(s) specified by its first argument, by its thirdargument. In the select written below the string Computer is replaced by thestring Comp.:

select replace(cname,Computer,Comp.) from COURSES;

This yields the following result:

REPLACE(CNAME,COMPUTER,COMP.)

----------------------------------

Introduction to Computing

Comp. Programming

Comp. Architecture

Data Structures

Higher Level Languages

Software Engineering

Graphics

Example 5.14.6 The function concat computes the concatenation of two stringsthat form its arguments. Its effect is identical to the concatenation operator ||that we discussed in Example 5.6.11. The phrase below prints the state and zipcode of each students as a single string:

select name, addr, concat(state,zip) as state_zip from STUDENTS;

This returns:

NAME ADDR STATE_ZIP

----------------------------------------------

Edwards P. David 10 Red Rd. MA02159

Grogan A. Mary Walnut St. MA02148

Mixon Leatha 100 School St. MA02146

McLane Sandy 30 Cass Rd. MA02122

Novak Roland 42 Beacon St. NH03060

Pierce Richard 70 Park St. MA02146

Prior Lorraine 8 Beacon St. MA02125

Rawlings Jerry 15 Pleasant Dr. MA02115

Lewis Jerry 1 Main Rd RI02904


Example 5.14.7 To extract substrings of strings is we can use the functionsubstr. To call this function we need to use the following syntax:

substr(string, integer [,integer ])A typical call such as substr(s, n, m) will return a the substring of length mof the string s that starts with the nth characater of s. If m is omitted, asin substr(s, n), then the function returns all charaters of s starting from thenth character to the end of s. If n is negative, then the characters are countedbackwards from the end of s.

The select phrase

select substr(Oracle,2,3) from dual;

will return:

SUB

---

rac

The next select which omits the third argument of substr:

select substr(Oracle,2) from dual

yields:

SUBST

-----

racle

which is the string that begins with the second character of Oracle and endswith the last character of this string.

Since the second argument of the function call in

select substr(Oracle,-4,3) from dual

is negative, the starting position of the substring is the 4th character countedfrom the end (that is, the character a) and thus, the query returns:

SUB

---

acl

The functions lpad and rpad can be used to enhance presentation of resultsof queries. The syntax of lpad is:

lpad(s, integer [string])The effect is to padd s to the left with spaces to bring the total length of thestring to the length specified by the second argument of the function. If thethird argument is present, then this string is repeated to the left to fill up thepadded string.

The function rpad has a similar syntax; however, the padding is done at theright of s.

Example 5.14.8 To print a list of all employees and their salaries (using thetabular variables EMPHIST and PERSINFO we can use the query:


select name, lpad(salary,7,$) as ann_salary from

persinfo, emphist

where persinfo.empno = emphist.empno


NAME ANN_SAL

----------------------------------- -------

Natalia Martins $150000

Laura Schwartz $120000

John Soriano $120000

Kendall MacRae $100000

Rachel Anderson $$70000

Richard Laughlin $$70000

Danielle Craig $$90000

Abby Walsh $$75000

Bailey Burns $$70000

5.14.3 Date functions

SQL Plus contains a class of functions that apply to the DATE type: extract,months between, etc.Example 5.14.9 The function extract computes a part of a date value. Itsfirst argument gives the desired date part; the second argument is the datevalue. For instance, to obtain the year part of the appt date attribute of thetable EMPHIST we write:

select empno, extract(year from appt_date) as start_y

from emphist;

This returns:

EMPNO START_Y

---------- ----------

1000 1999

1005 1999

1010 2000

1015 1999

1020 1999

1025 2000

1030 2000

1035 2000

1040 2000

Similarly, we can obtain the month part of a date by writing

select empno, extract(month from appt_date)

as start_m

from emphist



EMPNO START_M

---------- ----------

1000 10

1005 10

1010 1

1015 10

1020 11

1025 3

1030 1

1035 2

1040 3

Example 5.14.10 To compute the number of months an employee has workedwe can use the function month between. This will compute the number ofmonths between the current date (designated by the system-provided constantSYSDATE) and the date of hire:

select empno, months_between(SYSDATE,appt_date)

as month_served

from emphist

The table returned by this query is:

EMPNO MONTH_SERVED

---------- ------------

1000 35.8877397

1005 35.532901

1010 32.8877397

1015 35.1135461

1020 34.8877397

1025 30.5974171

1030 32.5974171

1035 31.2748365

1040 30.8877397

Arithmetic computations can be performed in the target list of any select.

Example 5.14.11 Suppose that a bonus is to be paid to the employees. Thebonus is computed by paying 10% of the current weekly salary (salary/52)(determined by a null value of the termination date), multiplied by the numberof months employed. This is computed by

select empno, 0.1 * months_between(SYSDATE,appt_date) * salary/52 as bonus

from emphist

where term_date is null;

This query returns:

5.15 Aggregate Functions in SQL 109

EMPNO BONUS

------------------

1000 10430.7253

1005 8262.69438

1010 7652.27254

1015 6804.93348

1020 4733.05642

1025 4155.51299

1030 5688.95627

1035 4550.04006

1040 4194.59488

5.15 Aggregate Functions in SQL

Aggregate functions are those functions that operate on sets of values. Typicalexamples include: sum, avg, max, min, and count.

The first four functions operate on columns of tables and ignore null values.The count returns the number of elements of the set that is its argument.

Example 5.15.1 The following select construct determines the largest gradeobtained by the student whose student number is 1011. The function max isapplied to the set of grades of the student whose number is 1011 and returnsthe largest value in this set:

select max(grade) as highgr from GRADES

where stno = 1011;


HIGHGR

------

90

For instance, sum(A) returns the sum of all values of the selected nonnullA-components of the tuples. Similarly, avg(A) returns the average value of thesame sequence. The expressions max(A) and min(A) yield the largest and thesmallest values in the set of A-components of the tuples selected by a query,respectively.

The functions sum and avg apply to attributes whose domains are numerical(such as integer or float); max and min apply to every kind of attribute.

If we wish to discard duplicate values from the sequences of values beforeapplying these functions, we need to use the word distinct. For instance,sum(distinct A) considers only the distinct nonnull values that occur in thesequence of components.

Example 5.15.2 We mentioned that the built-in functions max and min applyto string domains as well as to numerical domains. We use this feature of thesefunctions to determine the first and the last student in alphabetical order:


select min(name) as first, max(name) as last

from STUDENTS;

This query yields the table:

FIRST LAST

---------------- --------------

Edwards P. David Rawlings Jerry

Next, we show a select construct where the same functions are applied toa numerical domain:

select min(grade) as lowgr,

max(grade) as highgr from GRADES

where stno = 1011;

This generates the answer:

LOWGR HIGHGR

-----------------

40 90

The query

select avg(distinct grade) as avggr from GRADES

where stno = 1011

returns the table

AVGGR

-----

73.75

If we discard duplicate values as in

select avg(distinct grade) as avggr from GRADES

where stno = 1011

then the average grade is lower, indicating a preponderance of the higher gradesfor this student:

AVGGR

-----

68.33

Built-in functions can be used in subqueries. This is illustrated by the nextexample.Example 5.15.3 To retrieve the students who obtained a grade higher thanthe average grade in cs110 we write:

5.15 Aggregate Functions in SQL 111


and grade > all(select avg(grade) from grades

where cno=cs110);


STNO

----

2661

3566

5544

The count function can be used in several ways: count(A) can be used to determine the number of non-null entries underthe attribute A;

count(distinct A) computes the number of distinct non-null values thatoccur under A;

count(*) determines how many rows exist in a table.Note that count(distinct *) cannot be used in SQL.

Example 5.15.4 Here are several examples of the use of the count function.To find how many students took cs110 in the fall semester of 2002, we write:

select count(cno) from GRADES

where cno = cs110 and

sem = Fall and

year = 2003;

Since no records exist for any grades given during that semester in cs110, weobtain the answer:

COUNT(CNO)

----------

0

Observe that this table has a system-supplied column name COUNT(cno). Thishappens because we did not provide a name using as.

Let us determine how many students have ever registered for any course. Wehave to retrieve this result from GRADES, and we must use distinct to avoidcounting the same student several times (if the student took several courses):

select count(distinct stno) as nost

from GRADES;

This query returns the one-entry table:

NOST

----

8


Finally, let us determine the names of instructors who are teaching morethan one subject. For every instructor, we determine in a subquery the numberof courses taught. Then, we retain those instructors who taught more than onecourse:


1 < any (select count(distinct cno) from GRADES

where empno = INSTRUCTORS.empno);

We obtain the table:

NAME

------------

Evans Robert

Will Samuel

5.16 Sorting Results

Data obtained from a select construct may be sorted on one or several columnsusing the order by clause. This clause also gives the user the possibility ofopting for an ascending or descending sorting order on each of the columns. Bydefault, the ascending order is chosen.

Example 5.16.1 Suppose that we need to sort the GRADES tuples on thestudent number. For each student, we sort the grades in descending order. Thiscan be done with the query:

select * from GRADES

order by stno, grade desc;

This results in the output shown next:


---------- ----------- ----- ------ ---------- ----------

1011 019 cs210 FALL 2003 90

1011 056 cs240 SPRING 2004 90

1011 023 cs110 SPRING 2003 75

1011 019 cs110 FALL 2002 40

2415 019 cs240 SPRING 2003 100

2661 234 cs310 SPRING 2004 100

2661 019 cs110 FALL 2002 80

2661 019 cs210 FALL 2003 70

3442 234 cs410 SPRING 2003 60

3566 019 cs240 SPRING 2003 100

3566 019 cs110 FALL 2002 95

3566 019 cs210 FALL 2003 90

4022 056 cs240 SPRING 2004 80

4022 234 cs310 SPRING 2004 75

5.16 Sorting Results 113

4022 019 cs210 SPRING 2004 70

4022 023 cs110 SPRING 2003 60

5544 019 cs110 FALL 2002 100

5544 056 cs240 SPRING 2004 70

5571 019 cs210 SPRING 2004 85

5571 234 cs410 SPRING 2003 80

5571 019 cs240 SPRING 2003 50

Instead of using the name of the columns one could use their ordinal positionin the select phrase.Example 5.16.2 An equivalent form of the query from Example 5.16.1 is

select stno, empno, cno, sem, year, grade

from GRADES

order by 1, 6 desc;

Ordering of the results can also be achieved by using expressions.Example 5.16.3 To sort the grades based on the second digit of the coursenumber, and, then on the first digit of the course number (which are the fourthand the third characters of course numbers) we write:

select * from grades

order by substr(cno,4,1), substr(cno,3,1)

This will return the following result:


---------- ----------- ----- ------ ---------- ----------

1011 019 cs110 FALL 2002 40

2661 019 cs110 FALL 2002 80

3566 019 cs110 FALL 2002 95

5544 019 cs110 FALL 2002 100

1011 023 cs110 SPRING 2003 75

4022 023 cs110 SPRING 2003 60

1011 019 cs210 FALL 2003 90

3566 019 cs210 FALL 2003 90

4022 019 cs210 SPRING 2004 70

5571 019 cs210 SPRING 2004 85

2661 019 cs210 FALL 2003 70

sql

Documents

Transcript of sql