sql
-
Upload
maherkamel -
Category
Documents
-
view
9 -
download
0
description
Transcript of sql
-
Chapter 5
SQL The Relational
Language
5 SQL The Relational Language 63
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.2 Tabular Variables in SQL . . . . . . . . . . . . . . . . . . . . . . 65
5.2.1 Creation of Tables . . . . . . . . . . . . . . . . . . . . . . 665.3 Referential Integrity in SQL . . . . . . . . . . . . . . . . . . . . . 705.4 Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4.1 String Domains . . . . . . . . . . . . . . . . . . . . . . . . 725.4.2 Numeric Domains . . . . . . . . . . . . . . . . . . . . . . 725.4.3 Special Domains . . . . . . . . . . . . . . . . . . . . . . . 735.4.4 Basic Domains Supported by ORACLE . . . . . . . . . . 73
5.5 SELECT Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.6 The WHERE Option . . . . . . . . . . . . . . . . . . . . . . . . . 775.7 Union, Intersection, and Difference in SQL . . . . . . . . . . . . . 825.8 Table Product in SQL . . . . . . . . . . . . . . . . . . . . . . . . 845.9 Join in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.10 Sets and subqueries . . . . . . . . . . . . . . . . . . . . . . . . . . 885.11 Parametrized subqueries . . . . . . . . . . . . . . . . . . . . . . . 915.12 Subqueries and division . . . . . . . . . . . . . . . . . . . . . . . 935.13 Relational Completeness of SQL . . . . . . . . . . . . . . . . . . 955.14 Scalar Functions of SQL . . . . . . . . . . . . . . . . . . . . . . . 96
5.14.1 Numerical Functions . . . . . . . . . . . . . . . . . . . . . 965.14.2 String Functions . . . . . . . . . . . . . . . . . . . . . . . 975.14.3 Date functions . . . . . . . . . . . . . . . . . . . . . . . . 100
5.15 Aggregate Functions in SQL . . . . . . . . . . . . . . . . . . . . . 1025.16 Sorting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.17 The Group-by Option . . . . . . . . . . . . . . . . . . . . . . . . 107
5.17.1 The decode and case Functions . . . . . . . . . . . . . . . 111
-
66 SQL The Relational Language
5.17.2 The rollup and cube Extensions of group by . . . . . . . . 114
5.18 Analytical Capabilities of SQL Plus . . . . . . . . . . . . . . . . . 124
5.18.1 Ranking Functions . . . . . . . . . . . . . . . . . . . . . . 125
5.18.2 Top-n Queries . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.18.3 Windowing functions in SQL Plus . . . . . . . . . . . . . . 131
5.19 Statistics in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.19.1 Variance and Correlation . . . . . . . . . . . . . . . . . . 132
5.19.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 136
5.20 Graphs and SQL in SQL Plus . . . . . . . . . . . . . . . . . . . . 138
5.21 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.22 Access Rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.23 Views in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.24 Accessing metadata in SQLPlus . . . . . . . . . . . . . . . . . . . 151
5.25 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.26 Bibliographical Comments . . . . . . . . . . . . . . . . . . . . . . 155
5.1 Introduction
SQL is an acronym for Structured Query Language and is the name of the mostimportant tool for defining and manipulating relational databases. The develop-ment of SQL began in the mid-1970s at the IBM San Jose Research Laboratory.The success of an experimental IBM database system (known as System R) thatincorporated SQL compelled a number of software manufacturers to join IBMin developing relational database systems that incorporated SQL. In 1982, theAmerican National Standards Institute (ANSI) initiated the development of astandard for a query language for relational database systems, it opted for SQLas its prototype. The resulting ANSI standard, issued in 1986, was adopted asan International Standard by the International Organization for Standardization(ISO) in 1987.
In the late 1980s, embedded SQL was standardized by ANSI, and work onexpanding SQL continues. A much extended version of the original standard,known as SQL92, was adopted by ISO/IEC at the end of 1992. To reflect cur-rent trends in the database field towards object-relational technology, a newstandard ISO/IEC 9075-1, known as SQL99, was published in July 1999. Aswe shall see, SQL99 is a superset of SQL92. New features incorporated by thisstandard include object-relational extensions (user-defined data types, referencetypes, collections, large object support, table hierarchies), active database fea-tures (triggers), stored procedures and functions, on-line analytic processingextensions, etc. More recently, in 2003, a new standard was issued. This newedition of the standard includes a new chapter that deals with the interactionbetween SQL and XML (which we discuss in Chapter 10), correction to SQL99,and several new features.
Our presentation concentrates initially on common SQL features, applicableto a wide range of SQL implementations.
-
5.2 Tabular Variables in SQL 67
SQL is a nonprocedural language. This means that a query formulated inSQL need not specify how a problem is to be solved nor how data should beaccessed by the computing system; instead, an SQL query states what the queryis, i.e., what data are sought.
This leaves the user free to focus on the logic of the query. Because theDBMS makes use of its internal knowledge, in most cases, the DBMS generatesretrieval procedures that are faster than equivalent retrieval procedures builtdirectly by the user.
The SQL language consists of three components: the data definition lan-guage (DDL), the data manipulation language (DML), and the data controllanguage (DCL). The first component allows the user to define the structure ofthe tables of the database. The second contains retrieval and update directives.The last component allows the database administrator to define the access rightsto the database for various categories of users.
SQL syntax is format-free: tabs, carriage returns, and spaces can be includedanywhere a space occurs in the definition of an SQL construct. Also, case isinsignificant in table names, reserved words and keywords. However, case issignificant in character string literals.
5.2 Tabular Variables in SQL
When we introduced tables in Chapter 3, we assumed that the contents of a tableis a relation, that is, it is a set of tuples. To conform to the reality of databaseswe need to define the content of a table as a sequence of tuples. Thus, a tablemay contain several copies of the same tuple. If a table is allowed to containduplicates, then even if we know all components of a tuple, we may be unableto identify the corresponding row in the table uniquely. As a consequence, notevery table has a key.
In this section we present a topic that we refer to informally as table cre-ation. In reality, we create an object similar to a variable in a programminglanguage that we call a tabular variable. The values of a tabular variables aretables and these values change in time. Tabular variables are created using theconstruction create table.
Example 5.2.1 To create a tabular variable called PATRONS having the head-ing
name addr city zip telno date of birthwe write:
create table PATRONS (name varchar(35) not null,
addr varchar(50),
city varchar(25),
zip char(9),
telno char(12),
date_of_birth date);
As we shall see, each attribute is followed by a description of its domain. The
-
68 SQL The Relational Language
effect of this command is to create a tabular variable whose initial value is atable whose contents is the empty set of tuples:
PATRONS
name addr city zip telno date of birth
After inserting a first row, the next value of the tabular variable PATRONSis the table:
PATRONS
name addr city zip telno date of birth
Ann Richards 56 Green Ln Natick 02170 508-561-0987 02/15/78
A second insertion yields a new table as the value for the tabular variable:PATRONS
name addr city zip telno date of birth
Ann Richards 56 Green Ln Natick 01170 508-561-0987 02/15/78Ron Scott 50 Cider Hill Framingham 01160 608-663-0211 11/4/80
If the first patron moves to a new address, the first row is modified and thetabular variable assumes a third value:
PATRONS
name addr city zip telno date of birth
Ann Richards 77 Lake St. Milton 02186 617-364-0606 02/15/78Ron Scott 50 Cider Hill Framingham 02160 608-663-0211 11/4/80
The values that the tabular variable PATRONS may assume are the actualtables that have the name and the heading specified at the creation of thetabular variable. In addition, we can specify several types of constraints thatany value of the tabular variable must satisfy.
Before it is possible to create tabular variables and form queries, it is neces-sary to create an empty database in which to work. In practice, this is generallydone at the level of the operating system, usually with a command that is pro-vided by the vendor of the DBMS.
To start, we assume that we have created an empty database. In this sectionwe begin to discuss a part of the data definition component of SQL, namely, thecreation of tabular variables, or informally, the creation of database tables.
5.2.1 Table Creation
We refer to the components of the Data Definition Language (DDL) as directives.The SQL directive for adding tables to a database is create table.
At a minimum, as we saw in Example 5.2.1, creating a tabular variablein SQL requires that we specify its name and its attributes along with theirdomains. The syntax for this is:
create table table name
[(attr def {,attr def })],
where the attribute definition attr def has the syntax:
attribute name domain
-
5.2 Tabular Variables in SQL 69
A slightly more general form (that ignores certain details related to thephysical design of databases), the directive that creates a tabular variable iscreate table and has the form: following syntax:
create table [schema.]table name[(attr def | table constraint | table ref clause {,attr def | table constraint | table ref clause })],
where the attribute definition attr def has the syntax:
attribute name domain [default expr] [column ref clause]{column constraint}
As a result of the execution of this directive, an initial amount of spaceis reserved in secondary memory to accommodate future values of the tabularvariable, and the metadata are modified to reflect the addition of the new tabularvariable. Specialized SQL constructions, discussed later (insert, delete, andupdate) can be used to modify the value of this variable.
Creation of tabular variables permits placing restrictions, called constraintson the contents of any value that the tabular variable may assume. The con-straints that follow have a global character (which means that they apply tothe contents of a table in its entirety) and apply to any value that the tabularvariable may assume.
Definition 5.2.2 A primary key constraint has the form
[constraint constraint name] primary key(list of attributes)
when the primary key consists of the attributes of the list.
Alternate keys of tables can be specified using unique constraints. The syntaxof this type of constraints is:
[constraint constraint name] unique(list of attributes)
This indicates that no two rows of a table that is a value of the tabular variablemay have the same values for the attributes specified in the list.
A constraint of the form cC that involves conditions C that are a Booleancombination of conditions involving only components of tuples and constants isdenoted by:
[constraint constraint name] check(C)
When a constraint involves more than one attribute it is considered a tableconstraint ; otherwise, it is a column constraint. Referential integrity can beimposed by using the column constraint references in the definition of anattribute. To prevent certain components of tuples from assuming a null valuewe can impose the column constraint not null.
Example 5.2.3 To create the tabular variable INSTRUCTORS of the collegedatabase we use the following create table directive:
create table INSTRUCTORS(empno varchar(11) not null,
name varchar(35),
rank varchar(25),
roomno integer,
telno varchar(4), primary key(empno));
-
70 SQL The Relational Language
The domain of empno is defined to be the set of strings of length at most11. In addition, we have the column constraint not null, which means thatnull cannot be used as a value of the attribute empno. The domains of theother attributes have similar, obvious definitions that are discussed below. Notethat in the definition of INSTRUCTORS we impose a table constraint, namelyprimary key(empno).
Similarly, the tabular variables STUDENTS and COURSES are created by:
create table STUDENTS(stno varchar2(10) not null,
name varchar2(35) not null,
addr varchar2(35),
city varchar2(20),
state varchar2(2),
zip varchar2(10), primary key(stno));
create table COURSES(cno varchar2(5) not null,
cname varchar2(30),
cr smallint, primary key(cno));
A script that creates all tabular variables of the college database is containedin Appendix A.
Example 5.2.4 To express that the primary key of the table GRADES consistsof the attributes stno cno sem year we can say that this table satisfies the primarykey constraint:
constraint pkg primary key (stno, cno, sem, year)
Example 5.2.5 For the table EMPHIST, introduced in Example 3.3.5 we couldintroduce the tuple conditions:
constraint pos_sal check(salary > 0)
and
constraint suf_sal check(position != Programmer or salary > 65000),
respectively. They express that the salary must be a positive number and thatsomebody who is a programmer must be paid more than 65000 dollars, respec-tively.
Thus, the creation of the table EMPHIST can be achieved by:
create table EMPHIST(empno integer not null references PERSINFO(empno),
position varchar2(30),
dept varchar2(20),
appt_date date,
term_date date,
salary float,
check(position != Programmer or salary > 65000),
constraint pos_sal check(salary > 0));
A script that creates the tables PERSINFO, EMPHIST, and REPORTING is con-tained in Appendix C.
-
5.2 Tabular Variables in SQL 71
Example 5.2.6 In the directives enclosed below we state that stno is both aforeign key for ADVISING and, also, its primary key. In addition, empno is aforeign key for this table (being the primary key for the table INSTRUCTORS).
create table ADVISING(stno varchar2(10) not null
references STUDENTS(stno),
empno varchar2(11)
references INSTRUCTORS(empno),
primary key(stno));
create table GRADES(stno varchar2(10)
not null references STUDENTS(stno),
empno varchar2(11)
not null references INSTRUCTORS(empno),
cno varchar2(5)
not null references COURSES(cno),
sem varchar2(6) not null,
year smallint not null,
grade integer,
primary key(stno,cno,sem,year),
check (grade
-
72 SQL The Relational Language
If you wish to examine the headings of the tables you created you can issue,for example, the SQL Plus directive
describe INSTRUCTOR;
Then, SQL will print:
Name Null? Type
-------------------------- -------- ------------
EMPNO NOT NULL VARCHAR2(11)
NAME VARCHAR2(35)
RANK VARCHAR2(25)
ROOMNO NUMBER(38)
TELNO VARCHAR2(4)
The directive alter table is used for modifying the structure of an existingtable. Columns may be added or dropped, the names of the columns or theirdata types can be modified, etc. A simplified syntax of this directive is:
alter table table name modification specification
In turn, the modification specification depends on the particular change we needto impose on the table. Examples of such modification specifications include
add column name column type,drop column name,modify column name column type,rename column name to new column name,
as well as many other choices.
Example 5.2.7 To add a new year column to the table ADVISING we use thedirective:
alter table advising add year varchar2(4);
The entries of the new column year will have initially null values.Column types can be modified using the modify option. For instance, to
increase the maximum length of the values of stno to 12 characters we write:
alter table advising modify stno varchar(12);
Column renaming is executed using the option rename column. Below werename the column stno to studentno:
alter table advising rename column stno to studentno;
Finally, to drop the column year that we just added we write:
alter table advising drop column year;
5.3 Referential Integrity in SQL
We saw that referential integrity can be imposed in SQL using the columnconstraint references. An alternative method is to impose the table constraintforeign key. Its syntax is:
-
5.3 Referential Integrity in SQL 73
foreign key(attr def {,attr def })references table name ((attr def {,attr def })[on cascade delete]
The foreign key construction contains the option on cascade delete. Therole of this option is to define the behavior of the tables when deletions occurin the table where the primary key occurs. Namely, when a row is removedfrom the table containing the primary key and the clause on cascade delete isspecified, then all rows from the table that contains the corresponding foreignkey that match the removed row are also removed.
Example 5.3.1 Suppose that the tabular variable CITIES is created by:
create table CITIES (city varchar(40),
state char(2),
primary key (city,state));
A second tabular variable, STORES, records the stores that a retailer has inthe covered territory, and is created by
create table STORES (storeno integer not null,
address varchar(40) not null,
city varchar(40),
state char(2),
tel char(12),
primary key storeno,
foreign key(city,state) references CITIES(city,state)
on delete cascade);
To populate the tables we execute the following directives:
insert into CITIES(city, state) values(Boston,MA);
insert into CITIES(city, state) values(Spingfield,MA);
insert into CITIES(city, state) values(Providence,RI);
insert into CITIES(city, state) values(Hartford,CT);
insert into CITIES(city, state) values(Bayonne,NJ);
insert into STORES(storeno, addr, city, state, tel)
values(1,125 Harvard St.,Boston,MA,617-287-0991);
insert into STORES(storeno, addr, city, state, tel)
values(2,50 Storrow Drive,Boston,MA,617-566-7629);
insert into STORES(storeno, addr, city, state, tel)
values(3,85 Manton Av.,Providence,RI,401-453-1234);
insert into STORES(storeno, addr, city, state, tel)
values(4,40 West Street,Hartford,CT,860-232-4484);
insert into STORES(storeno, addr, city, state, tel)
values(5,5 Finley Av.,Bayonne,NJ,908-221-0094);
insert into STORES(storeno, addr, city, state, tel)
-
74 SQL The Relational Language
values(6,10 Linton Plaza,Hartford,CT,860-660-2220);
insert into STORES(storeno, addr, city, state, tel)
values(7,30 Stilson Rd.,Providence,RI,401-861-5249);
The values of the tabular variables CITIES and STORES are
CITY ST
---------------
Boston MA
Spingfield MA
Providence RI
Hartford CT
Bayonne NJ
and
STORENO ADDR CITY ST TEL
------------------------------------------------------
1 125 Harvard St. Boston MA 617-287-0991
2 50 Storrow Drive Boston MA 617-566-7629
3 85 Manton Av. Providence RI 401-453-1234
4 40 West Street Hartford CT 860-232-4484
5 5 Finley Av. Bayonne NJ 908-221-0094
6 10 Linton Plaza Hartford CT 860-660-2220
7 30 Stilson Rd. Providence RI 401-861-5249
Since the referential integrity was imposed between the tabular variablesCITIES and STORES we need to insert the tuples of CITIES before we can insertthe tuples of STORES. Otherwise, the cities mentioned in the values of STOREScan not reference a city in a value of CITIES and the insertion in STORES willbe rejected.
The presence of on delete cascade means that if a row is removed froma table CITIES that the rows corresponding to that city are also removed. Forexample, if the company closes its business in Hartford and we execute
delete from CITIES where
city = Hartford and state = CT;
then the rows of STORES corresponding to the stores in Hartford, CT will bedeleted automatically.
Removal of the tabular variables is also constrained by the referential in-tegrity. It would be impossible to remove the tabular city CITIES before weremove the table STORES because STORES references CITIES. Thus, the cor-rect order of removal is
drop table STORES;
drop table CITIES;
If the clause on cascade delete is absent, then the deletion of a row fromCITIES is impossible unless we delete first the rows of STORES that correspondto the city that is removed from CITIES.
-
5.4 Basic Data Types 75
5.4 Basic Data Types
SQL makes use of a collection of domains that, in general, varies from oneimplementation to another. Not all domains of the standard exist in everyimplementation, and not all domains of implementations exist in the standard.
Basic domains supported by virtually all implementations of SQL can beclassified as string domains, numerical domains, and special domains.
5.4.1 String Domains
String domains represent fixed-length or variable-length sets of sequences ofcharacters. In this category, we have char(n), which represents the set of stringsof characters (from a given basic set of characters) that have fixed length n. Sim-ilarly, varchar(n) represents the set of variable-length strings whose maximallength is n for n > 0.
5.4.2 Numeric Domains
The SQL standard prescribes two kinds of numeric domains: exact numeric datatypes : numeric, decimal, integer and smallint, and approximate numericdata types : float, double precision, and real. Their respective syntax is:
numeric [(p[, s])]decimal [(p[, s])]integer
smallint
float [(p)]double precision
real
Here, p stands for precision and s stands for scale (both of which are non-negative integers). The precision parameter refers to the total number of digits,while the scale indicates the number of digits to the right of the decimal point.The difference between numeric and decimal is that in the latter case, p isunderstood to be the maximum number of digits, while in the former case, p isthe exact total number of digits.
The domains smallint and integer have a number of digits dependent onthe implementation; however, the precision of integer is required to be equalto or larger than the precision of smallint.
The float domain includes approximate representations of real numbers hav-ing precision at least p. Also, real and double precision have implementation-dependent precision, where the precision of double precision is never smallerthan the one of real.
5.4.3 Special Domains
Specific DBMSs have their own domains. For instance, ORACLE has the longdomain that contains strings of characters of variable length that may be as
-
76 SQL The Relational Language
large as 65,535 characters.To allow us to begin working with actual examples as quickly as possible, we
introduce some basic domains for ORACLE. Other databases are quite similar,and the reader can obtain the relevant details by consulting product-specificmanuals.
5.4.4 Basic Domains Supported by ORACLE
We review briefly a few of the more important domains supported by ORACLE: In ORACLE, char[(n)] represents variable strings of characters of length
n, where 1 n 32767; the default value of n is 1. The domain charac-ter is the same as char. The characters and their order are determinedby the system during the installation of the DBMS.The domain varchar(n) requires n to be specified and also representsvariable-length strings of characters. It is the intention of ORACLE toseparate char(n) from varchar(n) in future releases: char(n) will repre-sent fixed-length strings while varchar(n) will represent variable-lengthstrings.The varchar2 data type stores variable-length character strings and iscurrently synonymous with the varchar data type. However, in a futureversion of Oracle, varchar might store variable-length character stringscompared with different comparison semantics. Currently there are twotypes of comparison semantics for strings in Oracle: blank-padded com-parison semantics and non-padded comparison semantics.When blank-padded comparison semantics is used, if the two values havedifferent lengths, Oracle first adds blanks to the end of the shorter oneso their lengths are equal. Oracle then compares the values characterby character up to the first character that differs. The value with thegreater character in the first differing position is considered greater. Iftwo values have no differing characters, then they are considered equal.This rule means that two values are equal if they differ only in the numberof trailing blanks. Oracle uses blank-padded comparison semantics onlywhen both values in the comparison are either expressions of data typechar, text literals, or values returned by the user-defined function.In the case of non-padded comparison semantics two values are comparedcharacter by character up to the first character that differs. The value withthe greater character in that position is considered greater. If two valuesof different length are identical up to the end of the shorter one, the longervalue is considered greater. If two values of equal length have no differingcharacters, then the values are considered equal. Oracle uses non-paddedcomparison semantics when one or both values in the comparison have thedata type varchar or varchar2.In either of the two comparison semantics we have ab > aa andab > a . However, in the blank-padded comparison semantics wehave a = a, while in the non-padded semantics we have a > a.
The domain date represents dates in the format dd-mmm-yy.
-
5.5 SELECT Phrases 77
The domain long (also denoted by long varchar) represents variable-length strings of characters with no more than 65,535 characters. At mostone attribute may have this domain in any table.
The number domain in ORACLE can be used in several forms as specifiedby the following syntax:
number [(p[, s])],
where p is the precision and s is the scale.The maximum precision of number is 38. The scale can vary between
84 and 127. If the scale is negative, the number is rounded to thespecified number of places to the left of the decimal point.The following cases may occur when we insert a value in a column whose
domain is number:
Data Domain Stored as
1,234,567.89 number 1234567.891,234,567.89 number(9) 12345671,234,567.89 number(9,2) 1234567.891,234,567.89 number(9,1) 1234567.91,234,567.8 number(6) error: exceeds precision1,234,567.89 number(10,1) 1234567.91,234,567.89 number(7,-2) 12345001,234,567.89 number(7,2) error: exceeds precision
If s > p, then s specifies the maximum number of valid digits after thedecimal point. For instance, number(4,5) requires at least one digit afterthe decimal point and rounds the digits after the fifth decimal digit. Thenumber 0.012358 is stored as 0.01236.Numbers may also be entered in exponential form, that is, including
an exponent preceded by E. For example, 1234567 can be represented as1.234567E+6, that is, as 1.234567 106.
Floating point domains are supported as float, float(*), and float(b),where b is the binary precision, that is, the number of significant binarydigits. The domains float and float(*) are equivalent, and they consistsof floating point numbers that can be represented by 126 binary digits (or,equivalently, by about 36 decimal digits).
To provide compatibility with other systems, ORACLE supports suchdomains as decimal, integer, smallint, real, and double precision.However, their internal representation is defined by the format of thenumber domain.
5.5 SELECT Phrases
Queries must be written based on the names and headings of the tabular vari-ables and not on the tables that represent their values at any given moment.This is similar to writing programs. A program should work for all legal inputsand not just the ones on which it was tested. In both cases, it is important to
-
78 SQL The Relational Language
focus on the abstract structure and not on specific examples. The way we writeSQL constructs must be directed only by the logic of the query and not by thecontent of a particular database instance. Just because the query generated theright answer for a particular instance of the database does not mean that it iscorrect.
The main retrieval construction is the select phrase. Consider a query thatwe solved previously using relational algebra. Recall that in Example 4.1.25 wefound the names of all instructors who have taught any student who lives inBrookline. The solution involved using product, selection, and projection:
T1 := (STUDENTS GRADES INSTRUCTORS)
T2 := T1where STUDENTS.stno = GRADES.stnoand
GRADES.empno = INSTRUCTORS.empnoand
STUDENTS.city = Brookline
ANS := T2[INSTRUCTORS.name].In SQL the same problem can be resolved using a single select phrase as in:
select INSTRUCTORS.name from STUDENTS, GRADES, INSTRUCTORS
where STUDENTS.stno = GRADES.stno and
GRADES.empno = INSTRUCTORS.empno and
STUDENTS.city = Brookline;
We can conceptualize the execution of this typical select using the opera-tions of relational algebra as follows:
1. The execution begins by performing the product of the tables listed afterthe reserved word from. In our case, this involves computing the product
STUDENTS GRADES INSTRUCTORS
2. The selection specified after the reserved word where is executed next,if the where part is present (we shall see that this may or may not bepresent in a select.) In our case, this amounts to retaining that part ofthe table product that satisfies the condition:
STUDENTS.stno = GRADES.stno and GRADES.empno = INSTRUCTORS.empno
and STUDENTS.city = Brookline
3. Finally, the result of the second phase is projected on the attributeslisted between select and from, that is, in our case, on the attributeINSTRUCTORS.name.
We use a string constant (also known as a literal) in the above select, namelyBrookline. String constant must begin and end with a single quote.
SQL is not case-sensitive. This means that you may or may not use capitalletters in any place in an SQL construction (except for string comparisons)without any effect on the value returned by the query.
As we mentioned above, the where part of a select (also known as the whereclause) is optional. This allows us to compute table projections in SQL as weshow next.
-
5.5 SELECT Phrases 79
Example 5.5.1 In Example 4.1.16, we obtain a list of instructors names andthe room numbers of their offices by projecting the table INSTRUCTORS onname roomno.
In SQL this can be done by writing
select name, roomno from INSTRUCTORS;
The select construct used above requires the table name for the table in-volved in the retrieval and the list of attributes that we need to extract.
In general, if we need to compute the projection of a table T on a set ofattributes A1 . . . An of the heading of T , we use the construct:
select A1, . . . , An from T ;
Example 5.5.2 To find out the states where the students originate we projectthe table STUDENTS on the attribute state. This is done by
select state from STUDENTS;
The system returns the result:
ST
--
MA
MA
MA
MA
NH
MA
MA
MA
RI
The value MA is repeated 7 times because there are seven students who livein Massachusetts.
Duplicate values can be eliminated from a query by using the option distinctas in
select distinct state from STUDENTS;
This will yield the answer:
ST
--
MA
NH
RI
where duplicate values have been dropped.
-
80 SQL The Relational Language
5.6 The WHERE Option
The where clause allows us to extract tuples that satisfy certain conditions; inother words, using the where clause we can perform selections.
Example 5.6.1 To find students who live in Boston we write:
select stno, name, addr, city, state, zip
from STUDENTS
where city = Boston;
This select will return the result:
STNO NAME ADDR CITY ST ZIP
---------------------------------------------------------------
2890 McLane Sandy 30 Cass Rd. Boston MA 02122
4022 Prior Lorraine 8 Beacon St. Boston MA 02125
5544 Rawlings Jerry 15 Pleasant Dr. Boston MA 02115
If we want to extract all columns of a table instance, we can use the wild-card character, *, instead of listing all columns. Thus, we can write the equiv-alent select:
select * from STUDENTS
where city = Boston;
Here the symbol * replaces the full attribute list.
Starting from simple conditions (which we called atomic conditions in Chap-ter 4) we can write queries involving more complicated conditions built by usingand, or, and not.Example 5.6.2 In Example 4.1.14 we retrieved the students who live in Bostonor Brookline. In SQL this can be done by:
select * from STUDENTS
where city = Boston or city = Brookline;
This yields the result:
STNO NAME ADDR CITY ST ZIP
---------------------------------------------------------------
2661 Mixon Leatha 100 School St. Brookline MA 02146
2890 McLane Sandy 30 Cass Rd. Boston MA 02122
3566 Pierce Richard 70 Park St. Brookline MA 02146
4022 Prior Lorraine 8 Beacon St. Boston MA 02125
5544 Rawlings Jerry 15 Pleasant Dr. Boston MA 02115
Example 5.6.3 To retrieve the grade records obtained in cs110 during theSpring of 2000 we can write in SQL:
select * from GRADES
where cno = cs110 and sem = SPRING
and year = 2003;
This returns the result:
-
5.6 The WHERE Option 81
STNO EMPNO CNO SEM YEAR GRADE
---------- ----------- ----- ------ ---------- ----------
1011 023 cs110 SPRING 2000 75
4022 023 cs110 SPRING 2000 60
Selections can be combined with projections in a single SQL phrase.Example 5.6.4 In the select phrase:
select stno, empno from GRADES
where cno = cs110;
the projection specified by where cno = cs110 is followed by the projectionon the attributes stno, empno that are listed after the word select. The resultis:
STNO EMPNO
---------- -----
1011 019
2661 019
3566 019
5544 019
1011 023
4022 023
In SQL we can use conditions that implement limited pattern matching.Certain patterns can be specified using the symbol % to replace 0 or more char-acters, and the underscore to replace exactly one character. As mentioned ear-lier, SQL is generally not case-sensitive; however, comparisons involving stringsare case-sensitive. Thus, Jerry and JERRY are distinct strings, and Jerry JERRY. The comparison is realized using the operator like.
Example 5.6.5 If we need to find the names and the addresses of studentswhose name includes Jerry, we can use the following select construct:
select name, addr from STUDENTS
where name like %Jerry%;
This returns the table:
NAME ADDR
--------------- ---------------
Rawlings Jerry 15 Pleasant Dr.
Lewis Jerry 1 Main Rd.
Example 5.6.6 Suppose the computer science course numbers were carefullyassigned so that all fundamental programming courses have a 1 as their seconddigit. Then the following select construct lists all fundamental programmingcourses.
-
82 SQL The Relational Language
select * from COURSES
where cno like cs_1%;
The corresponding result is:
CNO CNAME CR CAP
----- ------------------------- -- ---
cs110 Introduction to Computing 4 120
cs210 Computer Programming 4 100
cs310 Data Structures 3 60
cs410 Software Engineering 3 40
Using the reserved word between, we can ensure that certain values arelimited to prescribed intervals (including the endpoints of these intervals).
Example 5.6.7 To find the students who obtained some grade between 65 and85 in 2002, we apply the following query:
select distinct stno from GRADES
where year = 2003 and
grade between 65 and 85;
This select construct returns the table:
STNO
----
1011
2661
5571
The previous select is simply a shorthand for
select distinct stno from GRADES
where year = 2003 and
grade >= 65 and
grade
-
5.6 The WHERE Option 83
3566
4022
5571
We can test if certain components of tuples belong to a certain list of valuesby using a condition of the form:
A in (v1, . . . , vn)
This condition is satisfied by those tuples t such that t[A] has one of the valuesv1, . . . , vn.
Example 5.6.9 Let us find the names of students who live in Boston or Brook-line, a query that we already discussed in Example 5.6.2. Using the previouscondition we write:
select name from STUDENTS
where city in (Boston,Brookline);
Then, the desired list is:
NAME
--------------
Mixon Leatha
McLane Sandy
Pierce Richard
Prior Lorraine
Rawlings Jerry
On the other hand, we can test of the negation of a condition using not. Tolist the names of students who live outside those two cities, we write:
select name from STUDENTS
where not(city in (Boston,Brookline));
which has the same effect as:
select name from STUDENTS
where city not in (Boston,Brookline);
We can insert strings of characters in the list of fields of a select phrase toimprove the presentation of the results.Example 5.6.10 To insert the string Student name: in front of a studentname we write:
select Student name: , name from STUDENTS;
This yields the result:
STUDENTNAME: NAME
--------------- -----------------
Student name: Edwards P. David
Student name: Grogan A. Mary
Student name: Mixon Leatha
-
84 SQL The Relational Language
Student name: McLane Sandy
Student name: Novak Roland
Student name: Pierce Richard
Student name: Prior Lorraine
Student name: Rawlings Jerry
Student name: Lewis Jerry
In SQL Plus concatenation of strings can be achieved with the concatenationoperator ||.Example 5.6.11 In the next select phrase we concatenate the string Student with a students name, then with the string lives in and the students state:
select Student || name || lives in || state
from STUDENTS;
returns the result:
STUDENT||NAME||LIVESIN||STATE
-------------------------------------
Student Edwards P. David lives in MA
Student Grogan A. Mary lives in MA
Student Mixon Leatha lives in MA
Student McLane Sandy lives in MA
Student Novak Roland lives in NH
Student Pierce Richard lives in MA
Student Prior Lorraine lives in MA
Student Rawlings Jerry lives in MA
Student Lewis Jerry lives in RI
In Microsoft SQL server concatenation is obtained using the + operator.Example 5.6.12 The query shown in Example 5.6.11 can be executed in Mi-crosoft SQL server by
select Student + name + lives in + state
from STUDENTS;
5.7 Union, Intersection, and Difference in SQL
Recall that union, intersection, and difference as defined in relational algebramay occur only between tables that have identical headings. To execute theseoperations in SQL, we need to use compound select phrases. Compound selectsare constructed from simple select phrases using the reserved words union,intersect, and minus. As we shall see, SQL treats union, intersection and dif-ference as operations between sets of tuples, and therefore, it removes duplicatevalues from the results of the queries.
-
5.7 Union, Intersection, and Difference in SQL 85
Example 5.7.1 To determine the student numbers of students who took cs210we write:
select stno from GRADES
where cno = cs210;
This returns the result:
STNO
----
1011
2661
3566
5571
4022
Similarly, we find the student numbers of students who took cs240:
select stno from GRADES
where cno = cs240;
In turn, this yields:
STNO
----
3566
5571
2415
5544
1011
4022
To find the students who took both cs210 and cs240 we use the intersectto link the two previous select phrases into a compound select:
select stno from grades where cno = cs210
intersect
select stno from grades where cno = cs240;
This gives:
STNO
----
1011
3566
4022
5571
Neither SQL Server nor MySQL aupport the intersect operation.The union of the two sets is computed by the following compound select:
select stno from grades where cno = cs210
union
select stno from grades where cno = cs240;
Note that the tuples of the result are sorted.
-
86 SQL The Relational Language
STNO
----
1011
2415
2661
3566
4022
5544
5571
If we wish to retain all values in the result, then we need to use union allto link the select phrases as in:
select stno from grades where cno = cs210
union all
select stno from grades where cno = cs240;
The result contain now all values retrieved by the individual selects:
STNO
----
1011
2661
3566
5571
4022
3566
5571
2415
5544
1011
4022
The set difference is computed in ORACLEs SQLPlus using minus. To findthe students who took cs210 but did not take cs240 we write:
select stno from grades where cno = cs210
minus
select stno from grades where cno = cs240;
which returns the result:
STNO
----
2661
The reverse difference allows us to find students who took cs240 but did nottake cs210:
select stno from grades where cno = cs240
minus
select stno from grades where cno = cs210;
Now we obtain:
-
5.8 Table Product in SQL 87
STNO
----
2415
5544
Neither SQL Server nor MySQL support the minus operation.
5.8 Table Product in SQL
A select phrase that lists several distinct table names after the reserved wordfrom computes the product of these tables.Example 5.8.1 To examine all possible pairs of students/instructors we couldwrite the following select:
select STUDENTS.name, INSTRUCTORS.name
from STUDENTS, INSTRUCTORS;
Since our database is in a state that contains 9 students and five instructors,this will result in 45 rows retrieved:
NAME NAME
---------------------------------
Edwards P. David Evans Robert
Grogan A. Mary Evans Robert
Mixon Leatha Evans Robert
.
.
.
Pierce Richard Will Samuel
Prior Lorraine Will Samuel
Rawlings Jerry Will Samuel
Lewis Jerry Will Samuel
Observe that the tables are not linked by any where condition; as expectedin the definition of the product, all combinations of rows are considered. Af-ter computing the product, a projection eliminates all attributes except STU-DENTS.name and INSTRUCTORS.name.
Also, note that we use qualified attributes as required by the definition oftable product (see Definition 4.1.7).
The result produced by the query shown in Example 5.8.1 does not differ-entiate between the attributes STUDENTS.name and INSTRUCTORS.name andthis may confuse the user. Therefore, it is preferable to rename the columns ofthe result using the option as:
select STUDENTS.name as stname, INSTRUCTORS.name as instname
from STUDENTS, INSTRUCTORS;
This will generate:
-
88 SQL The Relational Language
STNAME INSTNAME
---------------------------------
Edwards P. David Evans Robert
Grogan A. Mary Evans Robert
Mixon Leatha Evans Robert
.
.
.
Pierce Richard Will Samuel
Prior Lorraine Will Samuel
Rawlings Jerry Will Samuel
Lewis Jerry Will Samuel
SQL allows for computations of products of several copies of the same tablethrough the creation of aliases; the solution proceeds using the logic discussedin Example 4.1.18. To create an alias S of a table named T we write the nameof the alias after the name of the table in the list of table, making sure that atleast one space (and no comma) exists between the name of the table and itsalias. For example, in the select phrase of Example 5.8.2 we create the alias Iby writing
INSTRUCTORS I
Table aliases are also known as correlation names of tables.
Example 5.8.2 Let us solve the query shown in Example 4.1.18: finding allpairs of instructors names for instructors who share the same office. This canbe done by writing:
select I.name as firstname, INSTRUCTORS.name as secname
from INSTRUCTORS I, INSTRUCTORS
where I.roomno = INSTRUCTORS.roomno and
I.empno < INSTRUCTORS.empno;
The result of this query is:
FIRSTNAME SECNAME
------------------------------
Exxon George Will Samuel
Conceptually, we create an alias I of the table INSTRUCTORS, compute theproduct between this alias and INSTRUCTORS and retain those pairs that sharethe the same room and consist of distinct individuals.
Example 5.8.3 Suppose that we need to find all triples of student names forstudents who live in the same city and state. Now we need to operate with threedistinct copies of the table STUDENTS. This is accomplished by:
select S1.name as name1, S2.name as name2,
S3.name as name3
from STUDENTS S1, STUDENTS S2,
STUDENTS S3
where S1.state = S2.state and
S2.state = S3.state and
-
5.9 Join in SQL 89
S1.city = S2.city and
S2.city = S3.city and
S1.stno < S2.stno and
S2.stno < S3.stno
which gives the result:
NAME1 NAME2 NAME3
----------------------------------------------------
McLane Sandy Prior Lorraine Rawlings Jerry
5.9 Join in SQL
Earlier version of SQL (at the level of SQL 1) dealt with the join operationindirectly, using operations like product, selection and projection, which arealready available in SQL. The blueprint of this treatment of the join operationwas outlined in Section 4.2.
Example 5.9.1 The SQL solution to the query considered in Example 4.2.2 inwhich we seek to find the names of instructors who have taught any four-creditcourse is solved in SQL by writing:
select distinct INSTRUCTORS.name
from COURSES, GRADES, INSTRUCTORS
where COURSES.cr = 4
and COURSES.cno = GRADES.cno
and GRADES.empno = INSTRUCTORS.empno;
The steps that we applied in relational algebra can be easily reconstituted inSQL. The first step that consists of computing the product
T1 = COURSES GRADES INSTRUCTORS
corresponds to the list of tables that follows the word from. Then, the selectionspecified by
T2 = (T1whereCOURSES.cr = 4 and
COURSES.cno = GRADES.cnoand
GRADES.empno = INSTRUCTORS.empno)
is executed using the condition of the where clause.Finally, the projection
T3(name) = T2[INSTRUCTORS.name]
corresponds to the list that follows select. In this case, this list consists of oneattribute, INSTRUCTORS.name.
We give one more example that shows a typical query that uses a join.
-
90 SQL The Relational Language
Example 5.9.2 To list all pairs of student names and course names such thatthe student takes the course, the relational algebra solution would require thatwe join the tables STUDENTS, GRADES, and COURSES. In SQL we write:
select distinct STUDENTS.name, COURSES.cname
from STUDENTS, GRADES, COURSES
where STUDENTS.stno = GRADES.stno and
GRADES.cno = COURSES.cno
This query will return:
NAME CNAME
--------------------------------------------------
Edwards P. David Computer Architecture
Edwards P. David Computer Programming
Edwards P. David Introduction to Computing
Grogan A. Mary Computer Architecture
.
.
.
Prior Lorraine Data Structures
Prior Lorraine Introduction to Computing
Rawlings Jerry Computer Architecture
Rawlings Jerry Introduction to Computing
SQL dialects that conform to the SQL-2 standard (e.g., SQLPlus of Oracle9i and 10g, and Microsoft SQL Server) allow the use of the constructions in-ner join and on. For example, the query discussed in Example 5.9.1 has thealternate solution:
select distinct INSTRUCTORS.name
from INSTRUCTORS, COURSES INNER JOIN GRADES
on COURSES.cno = GRADES.cno
where INSTRUCTORS.empno = GRADES.empno
and COURSES.cr = 4;
This query should be viewed as computing the natural join of COURSES andGRADES based on the equality of the attributes they share (as specified by theon clause. Then, the join INSTRUCTORS with the result of the previous join iscomputed using the simulation by product and selection method.
In SQL Plus queries involving natural joins among tables who attributesidentically named can be further simplified by applying the using clause, whichlists the attributes involved in the joining.Example 5.9.3 To retrieve the names of instructors who taught cs110 we canexecute in SQL Plus the query:
select distinct INSTRUCTORS.name
from INSTRUCTORS inner join GRADES
using(empno);
The inner join can be used for joins that involve more than two tables.
-
5.9 Join in SQL 91
Example 5.9.4 An alternative solution to the query of Example 5.9.1 thatmakes use of the inner join operation is:
select distinct INSTRUCTORS.name
from
INSTRUCTORS inner join GRADES
using(empno)
inner join COURSES
using(cno)
where COURSES.cr = 4
It is possible to involve several attributes in an inner join either explicitely,using the claues on or implicitely, employing the clause using.
Example 5.9.5 To find the pairs of names of students and instructors such thatthe student takes a course with the instructor who is also his or her advisor, wecan write either:
select distinct STUDENTS.name as sname, INSTRUCTORS.name as iname
from GRADES inner join ADVISING
on GRADES.stno = ADVISING.stno and
GRADES.empno = ADVISING.empno
inner join STUDENTS
on ADVISING.stno = STUDENTS.stno
inner join INSTRUCTORS
on ADVISING.empno = INSTRUCTORS.empno
or, equivalently,
select distinct STUDENTS.name as sname, INSTRUCTORS.name as iname
from GRADES inner join ADVISING
using(stno,empno)
inner join STUDENTS
using(stno)
inner join INSTRUCTORS
using(empno)
Cartesian product of two tables can be computed, alternatively using thecross join operation.Example 5.9.6 The query that we wrote in Example 5.8.1 that generates allpossible pairs of students/instructors can be also written as:
select STUDENTS.name, INSTRUCTORS.name
from STUDENTS cross join INSTRUCTORS;
which is equivalent to
select STUDENTS.name, INSTRUCTORS.name
from STUDENTS, INSTRUCTORS;
-
92 SQL The Relational Language
We saw that when joining two tables not all tuples are joinable; tuples thatbelong to one table and are not joinable with any tuple of the other table leave notrace in the join, a situation that is often inconvenient. As we saw in Section 4.3,the outer join operation and its variants, the left outer join and the right outerjoin can rectify this situation.
Let us assume that the tabular variables STUDENTS and INSTRUCTORScontain the tuples shown in Figure 5.1.
The tabular variable ADVISING has the same content as the one shown inFigure 3.1.
Example 5.9.7 Oracles own syntax for left outer join is to designate the com-ponent that may be null by (+), as in
select students.name, ADVISING.empno from STUDENTS, ADVISING
where STUDENTS.stno = ADVISING.stno(+)
This is equivalent to using the operator left outer join as specified by SQL2:
select STUDENTS.name, ADVISING.empno
from STUDENTS left outer join ADVISING
on STUDENTS.stno = ADVISING.stno
\end{PGMdiplsy}
Either phrase will return:
\begin{PGMdisplay}
name empno
-----------------------------------------
Edwards P. David 019
Grogan A. Mary 019
Mixon Leatha 023
McLane Sandy 023
Novak Roland 056
Pierce Richard 126
Prior Lorraine 234
Rawlings Jerry 023
Lewis Jerry 234
Davis Richard
Chu Martin
The computation of the right outer join is similar. We can use either Oraclessyntax as in
select ADVISING.stno, INSTRUCTORS.name from ADVISING, INSTRUCTORS
where ADVISING.empno(+) = INSTRUCTORS.empno;
or the standard syntax:
select ADVISING.stno, INSTRUCTORS.name
from ADVISING right outer join INSTRUCTORS
on ADVISING.empno = INSTRUCTORS.empno;
In either case we shall obtain:
-
5.9 Join in SQL 93
STUDENTS
stno name addr city state zip
1011 Edwards P. David 10 Red Rd. Newton MA 02159
2415 Grogan A. Mary 8 Walnut St. Malden MA 02148
2661 Mixon Leatha 100 School St. Brookline MA 02146
2890 McLane Sandy 30 Cass Rd. Boston MA 02122
3442 Novak Roland 42 Beacon St. Nashua NH 03060
3566 Pierce Richard 70 Park St. Brookline MA 02146
4022 Prior Lorraine 8 Beacon St. Boston MA 02125
5544 Rawlings Jerry 15 Pleasant Dr. Boston MA 02115
5571 Lewis Jerry 1 Main Rd Providence RI 02904
6410 Davis Richard 45 Algonquin Rd. Natick MA 01760
7209 Chu Martin 90 Rye Dr. Ayer MA 01290
INSTRUCTORS
empno name rank roomno telno
019 Evans Robert Professor 82 7122
023 Exxon George Professor 90 9101
056 Sawyer Kathy Assoc. Prof. 91 5110
126 Davis William Assoc. Prof. 72 5411
234 Will Samuel Assist.Prof. 90 7024
323 Campbell Kenneth Professor 102 7077
Figure 5.1: Tables with tuples with null components
-
94 SQL The Relational Language
stno name
---------------------------
1011 Evans Robert
2415 Evans Robert
2661 Exxon George
2890 Exxon George
5544 Exxon George
3442 Sawyer Kathy
3566 Davis William
4022 Will Samuel
5571 Will Samuel
Campbell Kenneth
Finally, the outer join itself can be computed using the operator outer join:
select STUDENTS.name, INSTRUCTORS.name
from students full outer join advising
using(stno)
full outer join instructors
using(empno);
This will result in
sname iname
-----------------------------------------------------
Grogan A. Mary Evans Robert
Edwards P. David Evans Robert
Rawlings Jerry Exxon George
McLane Sandy Exxon George
Mixon Leatha Exxon George
Novak Roland Sawyer Kathy
Pierce Richard Davis William
Lewis Jerry Will Samuel
Prior Lorraine Will Samuel
Chu Martin
Davis Richard
Campbell Kenneth
5.10 Sets and subqueries
Subqueries are select phrases that return sets rather than tables. Their mainuse is in conditions that involve sets. As we shall see, they are useful in imple-menting difference and division
in SQL. Syntactically, a subquery is written by placing a select phrasebetween a pair of parentheses. For example,
(select empno from INSTRUCTORS where rank = Professor);
-
5.10 Sets and subqueries 95
is a subquery that computes the employee numbers of full professors. To findthe student numbers of students who take a course with a full professor, weneed to select those GRADES tuples whose empno belongs to this set. This canbe accomplished by writing:
select distinct stno from GRADES where
empno in (select empno from INSTRUCTORS
where rank = Professor);
This will return the result:
STNO
----
1011
2415
2661
3566
4022
5544
5571
We refer to the first select as the calling select, or the main select or the outerselect; the select of the subquery is the inner select.
As we saw in the introductory example, membership can be tested using in.Here is another example.Example 5.10.1 Let us find the names of students who took cs310. We de-termine the student numbers of those students using a subquery. Then, in themain select, we retrieve those students whose student number is in this set.This can be accomplished using the query:
select name from STUDENTS where
stno in (select stno from GRADES
where cno = cs310);
which returns the table:
NAME
--------------
Mixon Leatha
Prior Lorraine
It is possible to test membership of a tuple in a set of tuples computed by asubquery using a condition of the form
(x1, . . . , xn) in (select A1, . . . , An from )
This type of test is included by SQL99, but it is not implemented in many SQLdialects. However, it is in ORACLE and DB2.
Example 5.10.2 To find the pairs of names of students and instructors suchthat the student took some course with the instructor but no four-credit course.This is computed by the following query:
-
96 SQL The Relational Language
select STUDENTS.name as sname,
INSTRUCTORS.name as iname
from STUDENTS, INSTRUCTORS where
(STUDENTS.stno, INSTRUCTORS.empno) in
(select stno, empno from grades
minus
select stno, empno from grades
where cno in (select cno
from courses
where cr=4));
This will return the following table:
SNAME INAME
------------------ -------------
Edwards P. David Sawyer Kathy
Grogan A. Mary Evans Robert
Mixon Leatha Will Samuel
Novak Roland Will Samuel
Prior Lorraine Sawyer Kathy
Prior Lorraine Will Samuel
Rawlings Jerry Sawyer Kathy
Lewis Jerry Will Samuel
If oper is one of the operators =, !=, , =, then we can useconditions of the form
v oper any (select ...)
or
v oper all (select ...)
in comparisons that involve some elements of the set computed by the subquery(select ) or all elements of the same set, respectively. Here != stands forinequality.
Example 5.10.3 To find the names of the courses taken by the student whosestudent number is 1011, we can use the following query:
select cname from COURSES where
cno = any (select cno from GRADES where stno= 1011);
The construct = any is synonymous with in, and the same query could bewritten as:
select cname from COURSES
where cno in (select cno from GRADES where stno= 1011);
Also, instead of = any we could use = some, and so, we have a third way orwriting the same query:
select cname from COURSES where
cno = some (select cno from GRADES where stno= 1011);
-
5.11 Parametrized subqueries 97
All three queries result in the table:
CNAME
-------------------------
Introduction to Computing
Computer Programming
Computer Architecture
Example 5.10.4 Let us find the students who obtained the highest grade incs110. Although there are methods that we explain later that yield much simplersolutions for this type of query, for the moment we want to illustrate the oper allcondition. We operate on two copies of GRADES. The copy used in the innerselect is intended for computing the grades obtained in cs110:
select stno from GRADES where cno = cs110
and grade >= all(select grade from GRADES
where cno = cs110);
We obtain the table:
STNO
----
5544
Example 5.10.5 Let us find the students who obtained a grade higher than anygrade given by a certain instructor, say Prof. Will. Using the all... subquerywe can write:
select stno from GRADES
where grade >= all(select grade from GRADES
where empno in (select empno from INSTRUCTORS
where name like Will%));
If we alter this query and replace the instructor with Prof. Davis, who teachesno courses, then the set computed by the query
select stno from GRADES
where grade >= all(select grade from GRADES
where empno in (select empno from INSTRUCTORS
where name like Davis%));
is empty. Therefore, every grade satisfies the inequality, and we obtain allstudent numers for students who took any course!
5.11 Parametrized subqueries
Often the retrieval performed in a subquery depends on a value provided by thecalling select. A typical situation is described in the following example.
-
98 SQL The Relational Language
Example 5.11.1 Suppose that we need to retrieve the course numbers of coursestaken by the student whose student number is STUDENTS.stno. Ignore (for themoment) the origin of this piece of data. Then, the retrieval is done by theselect construct:
select cno from GRADES
where stno = STUDENTS.stno;
Next, we transform this select into a subquery. The student number STU-DENTS.stno is provided by the outer select of the following construct:
select name from STUDENTS where cs310 in
(select cno from GRADES
where stno = STUDENTS.stno);
Observe that this provides an alternate solution to the query discussed in Ex-ample 5.10.1. Namely, we use a subquery to compute the courses taken by eachstudent. Then, we test if cs310 is one of these courses. We use the qualified at-tribute STUDENTS.stno inside the subquery to differentiate between this inputparameter and the attribute stno of the table GRADES.
Sets of tuples produced by subqueries can be tested for emptiness using theexists condition. Namely, the condition
exists (select from )
is true if the set returned by the subquery is not empty; similarly,
not exists (select from )
is true if the set returned by the subquery is empty.
Example 5.11.2 Let us give yet another solution to the query we solved inExample 5.10.1. This time, to find the names of students who took cs310 wedetermine the student numbers of those students for whom their set of gradesin cs310 is not empty. This can be done as follows:
select name from STUDENTS where
exists (select * from GRADES where
stno = STUDENTS.stno and
cno = cs310);
As a result, we have the table:
NAME
--------------
Mixon Leatha
Prior Lorraine
Example 5.11.3 To find instructors who never taught cs110, we search forinstructors for whom there is no GRADES record involving cs310 and theseinstructors. This can be done by
-
5.11 Parametrized subqueries 99
select name from INSTRUCTORS where
not exists(select * from GRADES where
empno = INSTRUCTORS.empno and
cno = cs110);
which results in the table:
NAME
-------------
Sawyer Kathy
Davis William
Will Samuel
If both the main query and the subquery deal with the same table and thesubquery requires input parameters from the outer query, then we use an aliasof the table in the outer query.
Example 5.11.4 Let us find the student numbers of students whose advisoris advising at least one other student. The information is contained in theADVISING table, and the following select construct uses both ADVISING (inthe subquery) and its alias A in the main query:
select distinct stno from ADVISING A
where exists (select * from ADVISING where
empno = A.empno and stno != A.stno);
This query returns the table:
STNO
----
1011
2415
2661
2890
4022
5544
5571
Subqueries can be used in the list that follows from in exactly the samemanner that tables are used. This is shown in the next example:Example 5.11.5 To find the pairs of names of students and instructors suchthat the student took some course with the instructor we could write:
select STUDENTS.name as sname, INSTRUCTORS.name as iname
from STUDENTS, INSTRUCTORS,
(select stno, empno from GRADES) PN
where STUDENTS.stno = PN.stno and
INSTRUCTORS.empno = PN.empno;
-
100 SQL The Relational Language
The difference of the tables T and S can be computed by looking for eachtuple of T for which there is no matching tuple in S. This can be done by:select * from T where
not exists (select * from S whereA1 = T.A1 and and An = T.An)
Example 5.11.6 Courses offered by the continuing education program but notby the regular program can be found by writing:
select * from CED_COURSES where
not exists (select * from COURSES where
cno = CED_COURSES.cno)
which takes advantage of the fact that cno is a key for both COURSES andCED COURSES.
5.12 Subqueries and division
SQL does not have a division operation. However, as we saw in Examples 4.1.27and 4.2.3, we can perform division using product, projection, and difference. Ofcourse, we could apply the prescription offered by relational algebra. This typeof solution is discussed in the next example.Example 5.12.1 The solution envisioned here is
select cno from grades
minus
select GI.cno from (select grades.cno,
instructors.empno
from grades, instructors
where rank=Professor) GI
where (GI.cno,GI.empno) not in (select cno,empno from grades)
Note that the query
select grades.cno, instructors.empno
from grades, instructors
where rank=Professor
computes all pairs of courses and instructor numbers using the product of thetables GRADES and INSTRUCTORS. Then, the query
select GI.cno from (select grades.cno,
instructors.empno
from grades, instructors
where rank=Professor) GI
where (GI.cno,GI.empno) not in (select cno, empno from grades)
extracts the courses that are part of the pairs of the previous table that do notappear in the GRADES table, that is, the courses for which there exists a fullprofessor who did not teach these courses. These are the courses that we needto exclude from the answer. Thus, the query presented at the beginning of thisexample yields the solution of the problem:
-
5.12 Subqueries and division 101
CNO
-----
cs110
The solution presented in Example 5.12.1 is not applicable in SQL dialectsthat do not have all the facilities of SQL Plus. Therefore, we need to examinean alternate way of solving this problem that is almost universally usable. Tounderstand the technique used we examine the solution of the query formulatedin the next example.
Example 5.12.2 Again, suppose that we need to determine the courses taughtby every full professor. Let us formulate the same query in a way that iseasier to translate in SQL. Namely, we find the courses for which there are nofull professors who have not taught these courses. The reader should realizeimmediately that this is simply a new formulation of the same problem. Weshow the solution in steps, moving gradually from plain English to SQL:
Phase I:
select cno from GRADES G where
not exists (instructors who are full professors andhave not taught the course G.cno)
Phase II:
select cno from GRADES G where
not exists (select * from INSTRUCTORS
where rank = Professor and
these instructors have not taught
the course G.cno)
Phase III:
select cno from GRADES G where
not exists (select * from INSTRUCTORS
where rank = Professor and
not exists (select * from GRADES
where empno = INSTRUCTORS.empno
and cno = G.cno));
In Phase I we determine in SQL the course numbers for which no full pro-fessor exists who has not taught these courses.
In Phase II we concentrate on preventing the existence of full professors whoare not teaching these courses. Note that Phase II still contains an untranslatedpart.
Finally, in Phase III, we translate the part who have not taught thesecourses using not exists for the second time.
Example 5.12.3 Another query that requires division in relational algebra is:Find names of instructors who have taught every 100-level course, that is,
-
102 SQL The Relational Language
every course whose first digit of the course number is 1. The formulation thatis better suited to SQL implementation is: Find names of instructors for whomthere is no 100 level course that they have not taught. This is solved by thefollowing select construct:
select name from INSTRUCTORS where
not exists (select * from COURSES
where cno like cs1__ and
not exists (select * from GRADES where
empno = INSTRUCTORS.empno
and cno = COURSES.cno));
The answer that results from our usual database instance is:
NAME
------------
Evans Robert
Exxon George
5.13 Relational Completeness of SQL
Between Chapter 4 and the current chapter, we have shown that SQL is capableof performing all operations of relational algebra. This fact is known as therelational completeness of SQL. As we shall see in subsequent chapters, thecapabilities of SQL go well beyond the standard definition of relational algebra.
5.14 Scalar Functions of SQL
We present now capabilities of SQL that go beyond relational algebra. We beginby discussing built-in functions in SQL that may act on individual values (scalarfunction), functions that act on sets of values (aggregate functions), and, also,analytic functions that can be used for various statistical computations. Then,we continue with the group by option of select, and we discuss several on-lineanalytic processing functions of SQL.
Scalar functions are built-in functions of SQL that work on individual values.They are highly dependent on the particular implementation of SQL, and welimit our discussions to functions implemented by ORACLEs SQL Plus. Thereare several types of scalar functions, depending on the types of their arguments.
5.14.1 Numerical Functions
Among the numerical functions, abs, sin, cos, power, sqrt, etc. have quite obviousdefinitions. For example, sqrt computes the square root of its argument, whilepower(x, y) computes xy.
-
5.14 Scalar Functions of SQL 103
Example 5.14.1 To illustrate some of the numerical functions we create atable POINTS whose rows represent labelled points in the plane:
create table POINTS(ptid varchar2(10), x integer, y integer,
primary key(ptid));
and populate this table using the commands:
insert into points(ptid, x, y) values (a,0,0);
insert into points(ptid, x, y) values (b,0,1);
insert into points(ptid, x, y) values (c,0,2);
insert into points(ptid, x, y) values (d,1,0);
insert into points(ptid, x, y) values (e,1,1);
insert into points(ptid, x, y) values (f,1,2);
insert into points(ptid, x, y) values (g,2,0);
insert into points(ptid, x, y) values (h,2,1);
insert into points(ptid, x, y) values (i,2,2);
insert into points(ptid, x, y) values (j,3,0);
insert into points(ptid, x, y) values (k,3,1);
insert into points(ptid, x, y) values (l,3,2);
To determine the distances from a to every other point we write
select p.ptid,
sqrt(power(a.x - p.x,2)+power(a.y - p.y,2))
as dist
from points a, points p
where a.ptid = a
This returns:
PTID DIST
---------- ----------
a 0
b 1
c 2
d 1
e 1.41421356
f 2.23606798
g 2
h 2.23606798
i 2.82842712
j 3
k 3.16227766
l 3.60555128
To compute the distance between a having the coordinates (xa, ya) and a pointp with coordinates (xp, yp), we use the formula d(a, p) =
(xa xp)2 + (ya yp)2.
The formula appears in the target list of the select and is written with the nu-merical functions sqrt and power.
In Oracle we can perform computations unrelated to any table by using afictious tabular variable that is named DUAL.
-
104 SQL The Relational Language
Example 5.14.2 To compute sin(30), sin(45) and sin(60) in Oracle, wewrite:
select sin(30*3.14159265359/180) as sin30,
sin(45*3.14159265359/180) as sin45,
sin(60*3.14159265359/180) as sin60
from dual;
We need to convert the angles to radians before sin is applied. This will return:
SIN30 SIN45 SIN60
---------- ---------- ----------
.5 .707106781 .866025404
Microsoft SQL server has a simpler way of performing this type of compu-tations in that it does not require the fictitious table.Example 5.14.3 In SQL server we can simply write:
select sin(30*3.14159265359/180) as sin30,
sin(45*3.14159265359/180) as sin45,
sin(60*3.14159265359/180) as sin60;
to obtain the same result as the one obtained in ORACLE.
5.14.2 String Functions
String functions can be used to transform strings, extract parts of strings, trans-form strings, etc.
The functions upper and lower, convert strings to upper and lower charac-ters, respectively.Example 5.14.4 To print names of students in capital characters and coursetitles in small letters we can write:
select distinct upper(STUDENTS.name) as STNAME,
lower(COURSES.cname) as course
from STUDENTS, GRADES, COURSES
where STUDENTS.stno = GRADES.stno and
GRADES.cno = COURSES.cno;
This generates the following return:
STNAME COURSE
-----------------------------------------------
EDWARDS P. DAVID computer architecture
EDWARDS P. DAVID computer programming
EDWARDS P. DAVID introduction to computing
GROGAN A. MARY computer architecture
.
.
.
-
5.14 Scalar Functions of SQL 105
PRIOR LORRAINE data structures
PRIOR LORRAINE introduction to computing
RAWLINGS JERRY computer architecture
RAWLINGS JERRY introduction to computing
These functions are particularly useful for performing string comparisons whenignoring case. Thus,
STE\% like upper(stephany)
is true.
Example 5.14.5 The string function replace substitutes every occurrence ofits second argument in the value(s) specified by its first argument, by its thirdargument. In the select written below the string Computer is replaced by thestring Comp.:
select replace(cname,Computer,Comp.) from COURSES;
This yields the following result:
REPLACE(CNAME,COMPUTER,COMP.)
----------------------------------
Introduction to Computing
Comp. Programming
Comp. Architecture
Data Structures
Higher Level Languages
Software Engineering
Graphics
Example 5.14.6 The function concat computes the concatenation of two stringsthat form its arguments. Its effect is identical to the concatenation operator ||that we discussed in Example 5.6.11. The phrase below prints the state and zipcode of each students as a single string:
select name, addr, concat(state,zip) as state_zip from STUDENTS;
This returns:
NAME ADDR STATE_ZIP
----------------------------------------------
Edwards P. David 10 Red Rd. MA02159
Grogan A. Mary Walnut St. MA02148
Mixon Leatha 100 School St. MA02146
McLane Sandy 30 Cass Rd. MA02122
Novak Roland 42 Beacon St. NH03060
Pierce Richard 70 Park St. MA02146
Prior Lorraine 8 Beacon St. MA02125
Rawlings Jerry 15 Pleasant Dr. MA02115
Lewis Jerry 1 Main Rd RI02904
-
106 SQL The Relational Language
Example 5.14.7 To extract substrings of strings is we can use the functionsubstr. To call this function we need to use the following syntax:
substr(string, integer [,integer ])A typical call such as substr(s, n, m) will return a the substring of length mof the string s that starts with the nth characater of s. If m is omitted, asin substr(s, n), then the function returns all charaters of s starting from thenth character to the end of s. If n is negative, then the characters are countedbackwards from the end of s.
The select phrase
select substr(Oracle,2,3) from dual;
will return:
SUB
---
rac
The next select which omits the third argument of substr:
select substr(Oracle,2) from dual
yields:
SUBST
-----
racle
which is the string that begins with the second character of Oracle and endswith the last character of this string.
Since the second argument of the function call in
select substr(Oracle,-4,3) from dual
is negative, the starting position of the substring is the 4th character countedfrom the end (that is, the character a) and thus, the query returns:
SUB
---
acl
The functions lpad and rpad can be used to enhance presentation of resultsof queries. The syntax of lpad is:
lpad(s, integer [string])The effect is to padd s to the left with spaces to bring the total length of thestring to the length specified by the second argument of the function. If thethird argument is present, then this string is repeated to the left to fill up thepadded string.
The function rpad has a similar syntax; however, the padding is done at theright of s.
Example 5.14.8 To print a list of all employees and their salaries (using thetabular variables EMPHIST and PERSINFO we can use the query:
-
5.14 Scalar Functions of SQL 107
select name, lpad(salary,7,$) as ann_salary from
persinfo, emphist
where persinfo.empno = emphist.empno
This will return the result:
NAME ANN_SAL
----------------------------------- -------
Natalia Martins $150000
Laura Schwartz $120000
John Soriano $120000
Kendall MacRae $100000
Rachel Anderson $$70000
Richard Laughlin $$70000
Danielle Craig $$90000
Abby Walsh $$75000
Bailey Burns $$70000
5.14.3 Date functions
SQL Plus contains a class of functions that apply to the DATE type: extract,months between, etc.Example 5.14.9 The function extract computes a part of a date value. Itsfirst argument gives the desired date part; the second argument is the datevalue. For instance, to obtain the year part of the appt date attribute of thetable EMPHIST we write:
select empno, extract(year from appt_date) as start_y
from emphist;
This returns:
EMPNO START_Y
---------- ----------
1000 1999
1005 1999
1010 2000
1015 1999
1020 1999
1025 2000
1030 2000
1035 2000
1040 2000
Similarly, we can obtain the month part of a date by writing
select empno, extract(month from appt_date)
as start_m
from emphist
This will return the result:
-
108 SQL The Relational Language
EMPNO START_M
---------- ----------
1000 10
1005 10
1010 1
1015 10
1020 11
1025 3
1030 1
1035 2
1040 3
Example 5.14.10 To compute the number of months an employee has workedwe can use the function month between. This will compute the number ofmonths between the current date (designated by the system-provided constantSYSDATE) and the date of hire:
select empno, months_between(SYSDATE,appt_date)
as month_served
from emphist
The table returned by this query is:
EMPNO MONTH_SERVED
---------- ------------
1000 35.8877397
1005 35.532901
1010 32.8877397
1015 35.1135461
1020 34.8877397
1025 30.5974171
1030 32.5974171
1035 31.2748365
1040 30.8877397
Arithmetic computations can be performed in the target list of any select.
Example 5.14.11 Suppose that a bonus is to be paid to the employees. Thebonus is computed by paying 10% of the current weekly salary (salary/52)(determined by a null value of the termination date), multiplied by the numberof months employed. This is computed by
select empno, 0.1 * months_between(SYSDATE,appt_date) * salary/52 as bonus
from emphist
where term_date is null;
This query returns:
-
5.15 Aggregate Functions in SQL 109
EMPNO BONUS
------------------
1000 10430.7253
1005 8262.69438
1010 7652.27254
1015 6804.93348
1020 4733.05642
1025 4155.51299
1030 5688.95627
1035 4550.04006
1040 4194.59488
5.15 Aggregate Functions in SQL
Aggregate functions are those functions that operate on sets of values. Typicalexamples include: sum, avg, max, min, and count.
The first four functions operate on columns of tables and ignore null values.The count returns the number of elements of the set that is its argument.
Example 5.15.1 The following select construct determines the largest gradeobtained by the student whose student number is 1011. The function max isapplied to the set of grades of the student whose number is 1011 and returnsthe largest value in this set:
select max(grade) as highgr from GRADES
where stno = 1011;
This returns the table:
HIGHGR
------
90
For instance, sum(A) returns the sum of all values of the selected nonnullA-components of the tuples. Similarly, avg(A) returns the average value of thesame sequence. The expressions max(A) and min(A) yield the largest and thesmallest values in the set of A-components of the tuples selected by a query,respectively.
The functions sum and avg apply to attributes whose domains are numerical(such as integer or float); max and min apply to every kind of attribute.
If we wish to discard duplicate values from the sequences of values beforeapplying these functions, we need to use the word distinct. For instance,sum(distinct A) considers only the distinct nonnull values that occur in thesequence of components.
Example 5.15.2 We mentioned that the built-in functions max and min applyto string domains as well as to numerical domains. We use this feature of thesefunctions to determine the first and the last student in alphabetical order:
-
110 SQL The Relational Language
select min(name) as first, max(name) as last
from STUDENTS;
This query yields the table:
FIRST LAST
---------------- --------------
Edwards P. David Rawlings Jerry
Next, we show a select construct where the same functions are applied toa numerical domain:
select min(grade) as lowgr,
max(grade) as highgr from GRADES
where stno = 1011;
This generates the answer:
LOWGR HIGHGR
-----------------
40 90
The query
select avg(distinct grade) as avggr from GRADES
where stno = 1011
returns the table
AVGGR
-----
73.75
If we discard duplicate values as in
select avg(distinct grade) as avggr from GRADES
where stno = 1011
then the average grade is lower, indicating a preponderance of the higher gradesfor this student:
AVGGR
-----
68.33
Built-in functions can be used in subqueries. This is illustrated by the nextexample.Example 5.15.3 To retrieve the students who obtained a grade higher thanthe average grade in cs110 we write:
-
5.15 Aggregate Functions in SQL 111
select stno from grades where cno = cs110
and grade > all(select avg(grade) from grades
where cno=cs110);
This returns the table:
STNO
----
2661
3566
5544
The count function can be used in several ways: count(A) can be used to determine the number of non-null entries underthe attribute A;
count(distinct A) computes the number of distinct non-null values thatoccur under A;
count(*) determines how many rows exist in a table.Note that count(distinct *) cannot be used in SQL.
Example 5.15.4 Here are several examples of the use of the count function.To find how many students took cs110 in the fall semester of 2002, we write:
select count(cno) from GRADES
where cno = cs110 and
sem = Fall and
year = 2003;
Since no records exist for any grades given during that semester in cs110, weobtain the answer:
COUNT(CNO)
----------
0
Observe that this table has a system-supplied column name COUNT(cno). Thishappens because we did not provide a name using as.
Let us determine how many students have ever registered for any course. Wehave to retrieve this result from GRADES, and we must use distinct to avoidcounting the same student several times (if the student took several courses):
select count(distinct stno) as nost
from GRADES;
This query returns the one-entry table:
NOST
----
8
-
112 SQL The Relational Language
Finally, let us determine the names of instructors who are teaching morethan one subject. For every instructor, we determine in a subquery the numberof courses taught. Then, we retain those instructors who taught more than onecourse:
select name from INSTRUCTORS where
1 < any (select count(distinct cno) from GRADES
where empno = INSTRUCTORS.empno);
We obtain the table:
NAME
------------
Evans Robert
Will Samuel
5.16 Sorting Results
Data obtained from a select construct may be sorted on one or several columnsusing the order by clause. This clause also gives the user the possibility ofopting for an ascending or descending sorting order on each of the columns. Bydefault, the ascending order is chosen.
Example 5.16.1 Suppose that we need to sort the GRADES tuples on thestudent number. For each student, we sort the grades in descending order. Thiscan be done with the query:
select * from GRADES
order by stno, grade desc;
This results in the output shown next:
STNO EMPNO CNO SEM YEAR GRADE
---------- ----------- ----- ------ ---------- ----------
1011 019 cs210 FALL 2003 90
1011 056 cs240 SPRING 2004 90
1011 023 cs110 SPRING 2003 75
1011 019 cs110 FALL 2002 40
2415 019 cs240 SPRING 2003 100
2661 234 cs310 SPRING 2004 100
2661 019 cs110 FALL 2002 80
2661 019 cs210 FALL 2003 70
3442 234 cs410 SPRING 2003 60
3566 019 cs240 SPRING 2003 100
3566 019 cs110 FALL 2002 95
3566 019 cs210 FALL 2003 90
4022 056 cs240 SPRING 2004 80
4022 234 cs310 SPRING 2004 75
-
5.16 Sorting Results 113
4022 019 cs210 SPRING 2004 70
4022 023 cs110 SPRING 2003 60
5544 019 cs110 FALL 2002 100
5544 056 cs240 SPRING 2004 70
5571 019 cs210 SPRING 2004 85
5571 234 cs410 SPRING 2003 80
5571 019 cs240 SPRING 2003 50
Instead of using the name of the columns one could use their ordinal positionin the select phrase.Example 5.16.2 An equivalent form of the query from Example 5.16.1 is
select stno, empno, cno, sem, year, grade
from GRADES
order by 1, 6 desc;
Ordering of the results can also be achieved by using expressions.Example 5.16.3 To sort the grades based on the second digit of the coursenumber, and, then on the first digit of the course number (which are the fourthand the third characters of course numbers) we write:
select * from grades
order by substr(cno,4,1), substr(cno,3,1)
This will return the following result:
STNO EMPNO CNO SEM YEAR GRADE
---------- ----------- ----- ------ ---------- ----------
1011 019 cs110 FALL 2002 40
2661 019 cs110 FALL 2002 80
3566 019 cs110 FALL 2002 95
5544 019 cs110 FALL 2002 100
1011 023 cs110 SPRING 2003 75
4022 023 cs110 SPRING 2003 60
1011 019 cs210 FALL 2003 90
3566 019 cs210 FALL 2003 90
4022 019 cs210 SPRING 2004 70
5571 019 cs210 SPRING 2004 85
2661 019 cs210 FALL 2003 70