sql

98
 Chapter 5 SQL — The Relational Language 5 SQL The Relational Language 63 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.2 T abular V ariables in SQL . . . . . . . . . . . . . . . . . . . . . . 65 5.2.1 Creation of T ables . . . . . . . . . . . . . . . . . . . . . . 66 5.3 Refer ential Integrity in SQL . . . . . . . . . . . . . . . . . . . . . 70 5.4 Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.4.1 String Domains . . . . . . . . . . . . . . . . . . . . . . . . 72 5.4.2 Numeric Domains . . . . . . . . . . . . . . . . . . . . . . 72 5.4.3 Special Domains . . . . . . . . . . . . . . . . . . . . . . . 73 5. 4. 4 Basic Do main s Su pported by ORACLE . . . . . . . . . . 73 5.5 SELECT Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.6 The WHERE Option . . . . . . . . . . . . . . . . . . . . . . . . . 77 5. 7 Unio n, In tersecti on, and Di erence in SQL . . . . . . . . . . . . . 82 5.8 T able Product in SQL . . . . . . . . . . . . . . . . . . . . . . . . 84 5.9 Join in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.10 Sets and subqueries . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.11 Parametrized subq ueries . . . . . . . . . . . . . . . . . . . . . . . 91 5.12 Subqueries and division . . . . . . . . . . . . . . . . . . . . . . . 93 5. 13 Relationa l C ompl e te n ess of SQL . . . . . . . . . . . . . . . . . . 95 5.14 Scal ar F unctions of SQL . . . . . . . . . . . . . . . . . . . . . . . 96 5.14.1 Numer ical Functio ns . . . . . . . . . . . . . . . . . . . . . 96 5.14.2 String F unctions . . . . . . . . . . . . . . . . . . . . . . . 97 5.14.3 Date functions . . . . . . . . . . . . . . . . . . . . . . . . 100 5. 15 Aggr egate F un ctions i n S QL . . . . . . . . . . . . . . . . . . . . . 1 02 5.16 Sorting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.17 The Group-by Option . . . . . . . . . . . . . . . . . . . . . . . . 107 5.1 7.1 The decode  and  case  F unctions . . . . . . . . . . . . . . . 111

description

SQL Exam

Transcript of sql

  • Chapter 5

    SQL The Relational

    Language

    5 SQL The Relational Language 63

    5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.2 Tabular Variables in SQL . . . . . . . . . . . . . . . . . . . . . . 65

    5.2.1 Creation of Tables . . . . . . . . . . . . . . . . . . . . . . 665.3 Referential Integrity in SQL . . . . . . . . . . . . . . . . . . . . . 705.4 Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    5.4.1 String Domains . . . . . . . . . . . . . . . . . . . . . . . . 725.4.2 Numeric Domains . . . . . . . . . . . . . . . . . . . . . . 725.4.3 Special Domains . . . . . . . . . . . . . . . . . . . . . . . 735.4.4 Basic Domains Supported by ORACLE . . . . . . . . . . 73

    5.5 SELECT Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.6 The WHERE Option . . . . . . . . . . . . . . . . . . . . . . . . . 775.7 Union, Intersection, and Difference in SQL . . . . . . . . . . . . . 825.8 Table Product in SQL . . . . . . . . . . . . . . . . . . . . . . . . 845.9 Join in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.10 Sets and subqueries . . . . . . . . . . . . . . . . . . . . . . . . . . 885.11 Parametrized subqueries . . . . . . . . . . . . . . . . . . . . . . . 915.12 Subqueries and division . . . . . . . . . . . . . . . . . . . . . . . 935.13 Relational Completeness of SQL . . . . . . . . . . . . . . . . . . 955.14 Scalar Functions of SQL . . . . . . . . . . . . . . . . . . . . . . . 96

    5.14.1 Numerical Functions . . . . . . . . . . . . . . . . . . . . . 965.14.2 String Functions . . . . . . . . . . . . . . . . . . . . . . . 975.14.3 Date functions . . . . . . . . . . . . . . . . . . . . . . . . 100

    5.15 Aggregate Functions in SQL . . . . . . . . . . . . . . . . . . . . . 1025.16 Sorting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.17 The Group-by Option . . . . . . . . . . . . . . . . . . . . . . . . 107

    5.17.1 The decode and case Functions . . . . . . . . . . . . . . . 111

  • 66 SQL The Relational Language

    5.17.2 The rollup and cube Extensions of group by . . . . . . . . 114

    5.18 Analytical Capabilities of SQL Plus . . . . . . . . . . . . . . . . . 124

    5.18.1 Ranking Functions . . . . . . . . . . . . . . . . . . . . . . 125

    5.18.2 Top-n Queries . . . . . . . . . . . . . . . . . . . . . . . . . 129

    5.18.3 Windowing functions in SQL Plus . . . . . . . . . . . . . . 131

    5.19 Statistics in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

    5.19.1 Variance and Correlation . . . . . . . . . . . . . . . . . . 132

    5.19.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 136

    5.20 Graphs and SQL in SQL Plus . . . . . . . . . . . . . . . . . . . . 138

    5.21 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

    5.22 Access Rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

    5.23 Views in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

    5.24 Accessing metadata in SQLPlus . . . . . . . . . . . . . . . . . . . 151

    5.25 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

    5.26 Bibliographical Comments . . . . . . . . . . . . . . . . . . . . . . 155

    5.1 Introduction

    SQL is an acronym for Structured Query Language and is the name of the mostimportant tool for defining and manipulating relational databases. The develop-ment of SQL began in the mid-1970s at the IBM San Jose Research Laboratory.The success of an experimental IBM database system (known as System R) thatincorporated SQL compelled a number of software manufacturers to join IBMin developing relational database systems that incorporated SQL. In 1982, theAmerican National Standards Institute (ANSI) initiated the development of astandard for a query language for relational database systems, it opted for SQLas its prototype. The resulting ANSI standard, issued in 1986, was adopted asan International Standard by the International Organization for Standardization(ISO) in 1987.

    In the late 1980s, embedded SQL was standardized by ANSI, and work onexpanding SQL continues. A much extended version of the original standard,known as SQL92, was adopted by ISO/IEC at the end of 1992. To reflect cur-rent trends in the database field towards object-relational technology, a newstandard ISO/IEC 9075-1, known as SQL99, was published in July 1999. Aswe shall see, SQL99 is a superset of SQL92. New features incorporated by thisstandard include object-relational extensions (user-defined data types, referencetypes, collections, large object support, table hierarchies), active database fea-tures (triggers), stored procedures and functions, on-line analytic processingextensions, etc. More recently, in 2003, a new standard was issued. This newedition of the standard includes a new chapter that deals with the interactionbetween SQL and XML (which we discuss in Chapter 10), correction to SQL99,and several new features.

    Our presentation concentrates initially on common SQL features, applicableto a wide range of SQL implementations.

  • 5.2 Tabular Variables in SQL 67

    SQL is a nonprocedural language. This means that a query formulated inSQL need not specify how a problem is to be solved nor how data should beaccessed by the computing system; instead, an SQL query states what the queryis, i.e., what data are sought.

    This leaves the user free to focus on the logic of the query. Because theDBMS makes use of its internal knowledge, in most cases, the DBMS generatesretrieval procedures that are faster than equivalent retrieval procedures builtdirectly by the user.

    The SQL language consists of three components: the data definition lan-guage (DDL), the data manipulation language (DML), and the data controllanguage (DCL). The first component allows the user to define the structure ofthe tables of the database. The second contains retrieval and update directives.The last component allows the database administrator to define the access rightsto the database for various categories of users.

    SQL syntax is format-free: tabs, carriage returns, and spaces can be includedanywhere a space occurs in the definition of an SQL construct. Also, case isinsignificant in table names, reserved words and keywords. However, case issignificant in character string literals.

    5.2 Tabular Variables in SQL

    When we introduced tables in Chapter 3, we assumed that the contents of a tableis a relation, that is, it is a set of tuples. To conform to the reality of databaseswe need to define the content of a table as a sequence of tuples. Thus, a tablemay contain several copies of the same tuple. If a table is allowed to containduplicates, then even if we know all components of a tuple, we may be unableto identify the corresponding row in the table uniquely. As a consequence, notevery table has a key.

    In this section we present a topic that we refer to informally as table cre-ation. In reality, we create an object similar to a variable in a programminglanguage that we call a tabular variable. The values of a tabular variables aretables and these values change in time. Tabular variables are created using theconstruction create table.

    Example 5.2.1 To create a tabular variable called PATRONS having the head-ing

    name addr city zip telno date of birthwe write:

    create table PATRONS (name varchar(35) not null,

    addr varchar(50),

    city varchar(25),

    zip char(9),

    telno char(12),

    date_of_birth date);

    As we shall see, each attribute is followed by a description of its domain. The

  • 68 SQL The Relational Language

    effect of this command is to create a tabular variable whose initial value is atable whose contents is the empty set of tuples:

    PATRONS

    name addr city zip telno date of birth

    After inserting a first row, the next value of the tabular variable PATRONSis the table:

    PATRONS

    name addr city zip telno date of birth

    Ann Richards 56 Green Ln Natick 02170 508-561-0987 02/15/78

    A second insertion yields a new table as the value for the tabular variable:PATRONS

    name addr city zip telno date of birth

    Ann Richards 56 Green Ln Natick 01170 508-561-0987 02/15/78Ron Scott 50 Cider Hill Framingham 01160 608-663-0211 11/4/80

    If the first patron moves to a new address, the first row is modified and thetabular variable assumes a third value:

    PATRONS

    name addr city zip telno date of birth

    Ann Richards 77 Lake St. Milton 02186 617-364-0606 02/15/78Ron Scott 50 Cider Hill Framingham 02160 608-663-0211 11/4/80

    The values that the tabular variable PATRONS may assume are the actualtables that have the name and the heading specified at the creation of thetabular variable. In addition, we can specify several types of constraints thatany value of the tabular variable must satisfy.

    Before it is possible to create tabular variables and form queries, it is neces-sary to create an empty database in which to work. In practice, this is generallydone at the level of the operating system, usually with a command that is pro-vided by the vendor of the DBMS.

    To start, we assume that we have created an empty database. In this sectionwe begin to discuss a part of the data definition component of SQL, namely, thecreation of tabular variables, or informally, the creation of database tables.

    5.2.1 Table Creation

    We refer to the components of the Data Definition Language (DDL) as directives.The SQL directive for adding tables to a database is create table.

    At a minimum, as we saw in Example 5.2.1, creating a tabular variablein SQL requires that we specify its name and its attributes along with theirdomains. The syntax for this is:

    create table table name

    [(attr def {,attr def })],

    where the attribute definition attr def has the syntax:

    attribute name domain

  • 5.2 Tabular Variables in SQL 69

    A slightly more general form (that ignores certain details related to thephysical design of databases), the directive that creates a tabular variable iscreate table and has the form: following syntax:

    create table [schema.]table name[(attr def | table constraint | table ref clause {,attr def | table constraint | table ref clause })],

    where the attribute definition attr def has the syntax:

    attribute name domain [default expr] [column ref clause]{column constraint}

    As a result of the execution of this directive, an initial amount of spaceis reserved in secondary memory to accommodate future values of the tabularvariable, and the metadata are modified to reflect the addition of the new tabularvariable. Specialized SQL constructions, discussed later (insert, delete, andupdate) can be used to modify the value of this variable.

    Creation of tabular variables permits placing restrictions, called constraintson the contents of any value that the tabular variable may assume. The con-straints that follow have a global character (which means that they apply tothe contents of a table in its entirety) and apply to any value that the tabularvariable may assume.

    Definition 5.2.2 A primary key constraint has the form

    [constraint constraint name] primary key(list of attributes)

    when the primary key consists of the attributes of the list.

    Alternate keys of tables can be specified using unique constraints. The syntaxof this type of constraints is:

    [constraint constraint name] unique(list of attributes)

    This indicates that no two rows of a table that is a value of the tabular variablemay have the same values for the attributes specified in the list.

    A constraint of the form cC that involves conditions C that are a Booleancombination of conditions involving only components of tuples and constants isdenoted by:

    [constraint constraint name] check(C)

    When a constraint involves more than one attribute it is considered a tableconstraint ; otherwise, it is a column constraint. Referential integrity can beimposed by using the column constraint references in the definition of anattribute. To prevent certain components of tuples from assuming a null valuewe can impose the column constraint not null.

    Example 5.2.3 To create the tabular variable INSTRUCTORS of the collegedatabase we use the following create table directive:

    create table INSTRUCTORS(empno varchar(11) not null,

    name varchar(35),

    rank varchar(25),

    roomno integer,

    telno varchar(4), primary key(empno));

  • 70 SQL The Relational Language

    The domain of empno is defined to be the set of strings of length at most11. In addition, we have the column constraint not null, which means thatnull cannot be used as a value of the attribute empno. The domains of theother attributes have similar, obvious definitions that are discussed below. Notethat in the definition of INSTRUCTORS we impose a table constraint, namelyprimary key(empno).

    Similarly, the tabular variables STUDENTS and COURSES are created by:

    create table STUDENTS(stno varchar2(10) not null,

    name varchar2(35) not null,

    addr varchar2(35),

    city varchar2(20),

    state varchar2(2),

    zip varchar2(10), primary key(stno));

    create table COURSES(cno varchar2(5) not null,

    cname varchar2(30),

    cr smallint, primary key(cno));

    A script that creates all tabular variables of the college database is containedin Appendix A.

    Example 5.2.4 To express that the primary key of the table GRADES consistsof the attributes stno cno sem year we can say that this table satisfies the primarykey constraint:

    constraint pkg primary key (stno, cno, sem, year)

    Example 5.2.5 For the table EMPHIST, introduced in Example 3.3.5 we couldintroduce the tuple conditions:

    constraint pos_sal check(salary > 0)

    and

    constraint suf_sal check(position != Programmer or salary > 65000),

    respectively. They express that the salary must be a positive number and thatsomebody who is a programmer must be paid more than 65000 dollars, respec-tively.

    Thus, the creation of the table EMPHIST can be achieved by:

    create table EMPHIST(empno integer not null references PERSINFO(empno),

    position varchar2(30),

    dept varchar2(20),

    appt_date date,

    term_date date,

    salary float,

    check(position != Programmer or salary > 65000),

    constraint pos_sal check(salary > 0));

    A script that creates the tables PERSINFO, EMPHIST, and REPORTING is con-tained in Appendix C.

  • 5.2 Tabular Variables in SQL 71

    Example 5.2.6 In the directives enclosed below we state that stno is both aforeign key for ADVISING and, also, its primary key. In addition, empno is aforeign key for this table (being the primary key for the table INSTRUCTORS).

    create table ADVISING(stno varchar2(10) not null

    references STUDENTS(stno),

    empno varchar2(11)

    references INSTRUCTORS(empno),

    primary key(stno));

    create table GRADES(stno varchar2(10)

    not null references STUDENTS(stno),

    empno varchar2(11)

    not null references INSTRUCTORS(empno),

    cno varchar2(5)

    not null references COURSES(cno),

    sem varchar2(6) not null,

    year smallint not null,

    grade integer,

    primary key(stno,cno,sem,year),

    check (grade

  • 72 SQL The Relational Language

    If you wish to examine the headings of the tables you created you can issue,for example, the SQL Plus directive

    describe INSTRUCTOR;

    Then, SQL will print:

    Name Null? Type

    -------------------------- -------- ------------

    EMPNO NOT NULL VARCHAR2(11)

    NAME VARCHAR2(35)

    RANK VARCHAR2(25)

    ROOMNO NUMBER(38)

    TELNO VARCHAR2(4)

    The directive alter table is used for modifying the structure of an existingtable. Columns may be added or dropped, the names of the columns or theirdata types can be modified, etc. A simplified syntax of this directive is:

    alter table table name modification specification

    In turn, the modification specification depends on the particular change we needto impose on the table. Examples of such modification specifications include

    add column name column type,drop column name,modify column name column type,rename column name to new column name,

    as well as many other choices.

    Example 5.2.7 To add a new year column to the table ADVISING we use thedirective:

    alter table advising add year varchar2(4);

    The entries of the new column year will have initially null values.Column types can be modified using the modify option. For instance, to

    increase the maximum length of the values of stno to 12 characters we write:

    alter table advising modify stno varchar(12);

    Column renaming is executed using the option rename column. Below werename the column stno to studentno:

    alter table advising rename column stno to studentno;

    Finally, to drop the column year that we just added we write:

    alter table advising drop column year;

    5.3 Referential Integrity in SQL

    We saw that referential integrity can be imposed in SQL using the columnconstraint references. An alternative method is to impose the table constraintforeign key. Its syntax is:

  • 5.3 Referential Integrity in SQL 73

    foreign key(attr def {,attr def })references table name ((attr def {,attr def })[on cascade delete]

    The foreign key construction contains the option on cascade delete. Therole of this option is to define the behavior of the tables when deletions occurin the table where the primary key occurs. Namely, when a row is removedfrom the table containing the primary key and the clause on cascade delete isspecified, then all rows from the table that contains the corresponding foreignkey that match the removed row are also removed.

    Example 5.3.1 Suppose that the tabular variable CITIES is created by:

    create table CITIES (city varchar(40),

    state char(2),

    primary key (city,state));

    A second tabular variable, STORES, records the stores that a retailer has inthe covered territory, and is created by

    create table STORES (storeno integer not null,

    address varchar(40) not null,

    city varchar(40),

    state char(2),

    tel char(12),

    primary key storeno,

    foreign key(city,state) references CITIES(city,state)

    on delete cascade);

    To populate the tables we execute the following directives:

    insert into CITIES(city, state) values(Boston,MA);

    insert into CITIES(city, state) values(Spingfield,MA);

    insert into CITIES(city, state) values(Providence,RI);

    insert into CITIES(city, state) values(Hartford,CT);

    insert into CITIES(city, state) values(Bayonne,NJ);

    insert into STORES(storeno, addr, city, state, tel)

    values(1,125 Harvard St.,Boston,MA,617-287-0991);

    insert into STORES(storeno, addr, city, state, tel)

    values(2,50 Storrow Drive,Boston,MA,617-566-7629);

    insert into STORES(storeno, addr, city, state, tel)

    values(3,85 Manton Av.,Providence,RI,401-453-1234);

    insert into STORES(storeno, addr, city, state, tel)

    values(4,40 West Street,Hartford,CT,860-232-4484);

    insert into STORES(storeno, addr, city, state, tel)

    values(5,5 Finley Av.,Bayonne,NJ,908-221-0094);

    insert into STORES(storeno, addr, city, state, tel)

  • 74 SQL The Relational Language

    values(6,10 Linton Plaza,Hartford,CT,860-660-2220);

    insert into STORES(storeno, addr, city, state, tel)

    values(7,30 Stilson Rd.,Providence,RI,401-861-5249);

    The values of the tabular variables CITIES and STORES are

    CITY ST

    ---------------

    Boston MA

    Spingfield MA

    Providence RI

    Hartford CT

    Bayonne NJ

    and

    STORENO ADDR CITY ST TEL

    ------------------------------------------------------

    1 125 Harvard St. Boston MA 617-287-0991

    2 50 Storrow Drive Boston MA 617-566-7629

    3 85 Manton Av. Providence RI 401-453-1234

    4 40 West Street Hartford CT 860-232-4484

    5 5 Finley Av. Bayonne NJ 908-221-0094

    6 10 Linton Plaza Hartford CT 860-660-2220

    7 30 Stilson Rd. Providence RI 401-861-5249

    Since the referential integrity was imposed between the tabular variablesCITIES and STORES we need to insert the tuples of CITIES before we can insertthe tuples of STORES. Otherwise, the cities mentioned in the values of STOREScan not reference a city in a value of CITIES and the insertion in STORES willbe rejected.

    The presence of on delete cascade means that if a row is removed froma table CITIES that the rows corresponding to that city are also removed. Forexample, if the company closes its business in Hartford and we execute

    delete from CITIES where

    city = Hartford and state = CT;

    then the rows of STORES corresponding to the stores in Hartford, CT will bedeleted automatically.

    Removal of the tabular variables is also constrained by the referential in-tegrity. It would be impossible to remove the tabular city CITIES before weremove the table STORES because STORES references CITIES. Thus, the cor-rect order of removal is

    drop table STORES;

    drop table CITIES;

    If the clause on cascade delete is absent, then the deletion of a row fromCITIES is impossible unless we delete first the rows of STORES that correspondto the city that is removed from CITIES.

  • 5.4 Basic Data Types 75

    5.4 Basic Data Types

    SQL makes use of a collection of domains that, in general, varies from oneimplementation to another. Not all domains of the standard exist in everyimplementation, and not all domains of implementations exist in the standard.

    Basic domains supported by virtually all implementations of SQL can beclassified as string domains, numerical domains, and special domains.

    5.4.1 String Domains

    String domains represent fixed-length or variable-length sets of sequences ofcharacters. In this category, we have char(n), which represents the set of stringsof characters (from a given basic set of characters) that have fixed length n. Sim-ilarly, varchar(n) represents the set of variable-length strings whose maximallength is n for n > 0.

    5.4.2 Numeric Domains

    The SQL standard prescribes two kinds of numeric domains: exact numeric datatypes : numeric, decimal, integer and smallint, and approximate numericdata types : float, double precision, and real. Their respective syntax is:

    numeric [(p[, s])]decimal [(p[, s])]integer

    smallint

    float [(p)]double precision

    real

    Here, p stands for precision and s stands for scale (both of which are non-negative integers). The precision parameter refers to the total number of digits,while the scale indicates the number of digits to the right of the decimal point.The difference between numeric and decimal is that in the latter case, p isunderstood to be the maximum number of digits, while in the former case, p isthe exact total number of digits.

    The domains smallint and integer have a number of digits dependent onthe implementation; however, the precision of integer is required to be equalto or larger than the precision of smallint.

    The float domain includes approximate representations of real numbers hav-ing precision at least p. Also, real and double precision have implementation-dependent precision, where the precision of double precision is never smallerthan the one of real.

    5.4.3 Special Domains

    Specific DBMSs have their own domains. For instance, ORACLE has the longdomain that contains strings of characters of variable length that may be as

  • 76 SQL The Relational Language

    large as 65,535 characters.To allow us to begin working with actual examples as quickly as possible, we

    introduce some basic domains for ORACLE. Other databases are quite similar,and the reader can obtain the relevant details by consulting product-specificmanuals.

    5.4.4 Basic Domains Supported by ORACLE

    We review briefly a few of the more important domains supported by ORACLE: In ORACLE, char[(n)] represents variable strings of characters of length

    n, where 1 n 32767; the default value of n is 1. The domain charac-ter is the same as char. The characters and their order are determinedby the system during the installation of the DBMS.The domain varchar(n) requires n to be specified and also representsvariable-length strings of characters. It is the intention of ORACLE toseparate char(n) from varchar(n) in future releases: char(n) will repre-sent fixed-length strings while varchar(n) will represent variable-lengthstrings.The varchar2 data type stores variable-length character strings and iscurrently synonymous with the varchar data type. However, in a futureversion of Oracle, varchar might store variable-length character stringscompared with different comparison semantics. Currently there are twotypes of comparison semantics for strings in Oracle: blank-padded com-parison semantics and non-padded comparison semantics.When blank-padded comparison semantics is used, if the two values havedifferent lengths, Oracle first adds blanks to the end of the shorter oneso their lengths are equal. Oracle then compares the values characterby character up to the first character that differs. The value with thegreater character in the first differing position is considered greater. Iftwo values have no differing characters, then they are considered equal.This rule means that two values are equal if they differ only in the numberof trailing blanks. Oracle uses blank-padded comparison semantics onlywhen both values in the comparison are either expressions of data typechar, text literals, or values returned by the user-defined function.In the case of non-padded comparison semantics two values are comparedcharacter by character up to the first character that differs. The value withthe greater character in that position is considered greater. If two valuesof different length are identical up to the end of the shorter one, the longervalue is considered greater. If two values of equal length have no differingcharacters, then the values are considered equal. Oracle uses non-paddedcomparison semantics when one or both values in the comparison have thedata type varchar or varchar2.In either of the two comparison semantics we have ab > aa andab > a . However, in the blank-padded comparison semantics wehave a = a, while in the non-padded semantics we have a > a.

    The domain date represents dates in the format dd-mmm-yy.

  • 5.5 SELECT Phrases 77

    The domain long (also denoted by long varchar) represents variable-length strings of characters with no more than 65,535 characters. At mostone attribute may have this domain in any table.

    The number domain in ORACLE can be used in several forms as specifiedby the following syntax:

    number [(p[, s])],

    where p is the precision and s is the scale.The maximum precision of number is 38. The scale can vary between

    84 and 127. If the scale is negative, the number is rounded to thespecified number of places to the left of the decimal point.The following cases may occur when we insert a value in a column whose

    domain is number:

    Data Domain Stored as

    1,234,567.89 number 1234567.891,234,567.89 number(9) 12345671,234,567.89 number(9,2) 1234567.891,234,567.89 number(9,1) 1234567.91,234,567.8 number(6) error: exceeds precision1,234,567.89 number(10,1) 1234567.91,234,567.89 number(7,-2) 12345001,234,567.89 number(7,2) error: exceeds precision

    If s > p, then s specifies the maximum number of valid digits after thedecimal point. For instance, number(4,5) requires at least one digit afterthe decimal point and rounds the digits after the fifth decimal digit. Thenumber 0.012358 is stored as 0.01236.Numbers may also be entered in exponential form, that is, including

    an exponent preceded by E. For example, 1234567 can be represented as1.234567E+6, that is, as 1.234567 106.

    Floating point domains are supported as float, float(*), and float(b),where b is the binary precision, that is, the number of significant binarydigits. The domains float and float(*) are equivalent, and they consistsof floating point numbers that can be represented by 126 binary digits (or,equivalently, by about 36 decimal digits).

    To provide compatibility with other systems, ORACLE supports suchdomains as decimal, integer, smallint, real, and double precision.However, their internal representation is defined by the format of thenumber domain.

    5.5 SELECT Phrases

    Queries must be written based on the names and headings of the tabular vari-ables and not on the tables that represent their values at any given moment.This is similar to writing programs. A program should work for all legal inputsand not just the ones on which it was tested. In both cases, it is important to

  • 78 SQL The Relational Language

    focus on the abstract structure and not on specific examples. The way we writeSQL constructs must be directed only by the logic of the query and not by thecontent of a particular database instance. Just because the query generated theright answer for a particular instance of the database does not mean that it iscorrect.

    The main retrieval construction is the select phrase. Consider a query thatwe solved previously using relational algebra. Recall that in Example 4.1.25 wefound the names of all instructors who have taught any student who lives inBrookline. The solution involved using product, selection, and projection:

    T1 := (STUDENTS GRADES INSTRUCTORS)

    T2 := T1where STUDENTS.stno = GRADES.stnoand

    GRADES.empno = INSTRUCTORS.empnoand

    STUDENTS.city = Brookline

    ANS := T2[INSTRUCTORS.name].In SQL the same problem can be resolved using a single select phrase as in:

    select INSTRUCTORS.name from STUDENTS, GRADES, INSTRUCTORS

    where STUDENTS.stno = GRADES.stno and

    GRADES.empno = INSTRUCTORS.empno and

    STUDENTS.city = Brookline;

    We can conceptualize the execution of this typical select using the opera-tions of relational algebra as follows:

    1. The execution begins by performing the product of the tables listed afterthe reserved word from. In our case, this involves computing the product

    STUDENTS GRADES INSTRUCTORS

    2. The selection specified after the reserved word where is executed next,if the where part is present (we shall see that this may or may not bepresent in a select.) In our case, this amounts to retaining that part ofthe table product that satisfies the condition:

    STUDENTS.stno = GRADES.stno and GRADES.empno = INSTRUCTORS.empno

    and STUDENTS.city = Brookline

    3. Finally, the result of the second phase is projected on the attributeslisted between select and from, that is, in our case, on the attributeINSTRUCTORS.name.

    We use a string constant (also known as a literal) in the above select, namelyBrookline. String constant must begin and end with a single quote.

    SQL is not case-sensitive. This means that you may or may not use capitalletters in any place in an SQL construction (except for string comparisons)without any effect on the value returned by the query.

    As we mentioned above, the where part of a select (also known as the whereclause) is optional. This allows us to compute table projections in SQL as weshow next.

  • 5.5 SELECT Phrases 79

    Example 5.5.1 In Example 4.1.16, we obtain a list of instructors names andthe room numbers of their offices by projecting the table INSTRUCTORS onname roomno.

    In SQL this can be done by writing

    select name, roomno from INSTRUCTORS;

    The select construct used above requires the table name for the table in-volved in the retrieval and the list of attributes that we need to extract.

    In general, if we need to compute the projection of a table T on a set ofattributes A1 . . . An of the heading of T , we use the construct:

    select A1, . . . , An from T ;

    Example 5.5.2 To find out the states where the students originate we projectthe table STUDENTS on the attribute state. This is done by

    select state from STUDENTS;

    The system returns the result:

    ST

    --

    MA

    MA

    MA

    MA

    NH

    MA

    MA

    MA

    RI

    The value MA is repeated 7 times because there are seven students who livein Massachusetts.

    Duplicate values can be eliminated from a query by using the option distinctas in

    select distinct state from STUDENTS;

    This will yield the answer:

    ST

    --

    MA

    NH

    RI

    where duplicate values have been dropped.

  • 80 SQL The Relational Language

    5.6 The WHERE Option

    The where clause allows us to extract tuples that satisfy certain conditions; inother words, using the where clause we can perform selections.

    Example 5.6.1 To find students who live in Boston we write:

    select stno, name, addr, city, state, zip

    from STUDENTS

    where city = Boston;

    This select will return the result:

    STNO NAME ADDR CITY ST ZIP

    ---------------------------------------------------------------

    2890 McLane Sandy 30 Cass Rd. Boston MA 02122

    4022 Prior Lorraine 8 Beacon St. Boston MA 02125

    5544 Rawlings Jerry 15 Pleasant Dr. Boston MA 02115

    If we want to extract all columns of a table instance, we can use the wild-card character, *, instead of listing all columns. Thus, we can write the equiv-alent select:

    select * from STUDENTS

    where city = Boston;

    Here the symbol * replaces the full attribute list.

    Starting from simple conditions (which we called atomic conditions in Chap-ter 4) we can write queries involving more complicated conditions built by usingand, or, and not.Example 5.6.2 In Example 4.1.14 we retrieved the students who live in Bostonor Brookline. In SQL this can be done by:

    select * from STUDENTS

    where city = Boston or city = Brookline;

    This yields the result:

    STNO NAME ADDR CITY ST ZIP

    ---------------------------------------------------------------

    2661 Mixon Leatha 100 School St. Brookline MA 02146

    2890 McLane Sandy 30 Cass Rd. Boston MA 02122

    3566 Pierce Richard 70 Park St. Brookline MA 02146

    4022 Prior Lorraine 8 Beacon St. Boston MA 02125

    5544 Rawlings Jerry 15 Pleasant Dr. Boston MA 02115

    Example 5.6.3 To retrieve the grade records obtained in cs110 during theSpring of 2000 we can write in SQL:

    select * from GRADES

    where cno = cs110 and sem = SPRING

    and year = 2003;

    This returns the result:

  • 5.6 The WHERE Option 81

    STNO EMPNO CNO SEM YEAR GRADE

    ---------- ----------- ----- ------ ---------- ----------

    1011 023 cs110 SPRING 2000 75

    4022 023 cs110 SPRING 2000 60

    Selections can be combined with projections in a single SQL phrase.Example 5.6.4 In the select phrase:

    select stno, empno from GRADES

    where cno = cs110;

    the projection specified by where cno = cs110 is followed by the projectionon the attributes stno, empno that are listed after the word select. The resultis:

    STNO EMPNO

    ---------- -----

    1011 019

    2661 019

    3566 019

    5544 019

    1011 023

    4022 023

    In SQL we can use conditions that implement limited pattern matching.Certain patterns can be specified using the symbol % to replace 0 or more char-acters, and the underscore to replace exactly one character. As mentioned ear-lier, SQL is generally not case-sensitive; however, comparisons involving stringsare case-sensitive. Thus, Jerry and JERRY are distinct strings, and Jerry JERRY. The comparison is realized using the operator like.

    Example 5.6.5 If we need to find the names and the addresses of studentswhose name includes Jerry, we can use the following select construct:

    select name, addr from STUDENTS

    where name like %Jerry%;

    This returns the table:

    NAME ADDR

    --------------- ---------------

    Rawlings Jerry 15 Pleasant Dr.

    Lewis Jerry 1 Main Rd.

    Example 5.6.6 Suppose the computer science course numbers were carefullyassigned so that all fundamental programming courses have a 1 as their seconddigit. Then the following select construct lists all fundamental programmingcourses.

  • 82 SQL The Relational Language

    select * from COURSES

    where cno like cs_1%;

    The corresponding result is:

    CNO CNAME CR CAP

    ----- ------------------------- -- ---

    cs110 Introduction to Computing 4 120

    cs210 Computer Programming 4 100

    cs310 Data Structures 3 60

    cs410 Software Engineering 3 40

    Using the reserved word between, we can ensure that certain values arelimited to prescribed intervals (including the endpoints of these intervals).

    Example 5.6.7 To find the students who obtained some grade between 65 and85 in 2002, we apply the following query:

    select distinct stno from GRADES

    where year = 2003 and

    grade between 65 and 85;

    This select construct returns the table:

    STNO

    ----

    1011

    2661

    5571

    The previous select is simply a shorthand for

    select distinct stno from GRADES

    where year = 2003 and

    grade >= 65 and

    grade

  • 5.6 The WHERE Option 83

    3566

    4022

    5571

    We can test if certain components of tuples belong to a certain list of valuesby using a condition of the form:

    A in (v1, . . . , vn)

    This condition is satisfied by those tuples t such that t[A] has one of the valuesv1, . . . , vn.

    Example 5.6.9 Let us find the names of students who live in Boston or Brook-line, a query that we already discussed in Example 5.6.2. Using the previouscondition we write:

    select name from STUDENTS

    where city in (Boston,Brookline);

    Then, the desired list is:

    NAME

    --------------

    Mixon Leatha

    McLane Sandy

    Pierce Richard

    Prior Lorraine

    Rawlings Jerry

    On the other hand, we can test of the negation of a condition using not. Tolist the names of students who live outside those two cities, we write:

    select name from STUDENTS

    where not(city in (Boston,Brookline));

    which has the same effect as:

    select name from STUDENTS

    where city not in (Boston,Brookline);

    We can insert strings of characters in the list of fields of a select phrase toimprove the presentation of the results.Example 5.6.10 To insert the string Student name: in front of a studentname we write:

    select Student name: , name from STUDENTS;

    This yields the result:

    STUDENTNAME: NAME

    --------------- -----------------

    Student name: Edwards P. David

    Student name: Grogan A. Mary

    Student name: Mixon Leatha

  • 84 SQL The Relational Language

    Student name: McLane Sandy

    Student name: Novak Roland

    Student name: Pierce Richard

    Student name: Prior Lorraine

    Student name: Rawlings Jerry

    Student name: Lewis Jerry

    In SQL Plus concatenation of strings can be achieved with the concatenationoperator ||.Example 5.6.11 In the next select phrase we concatenate the string Student with a students name, then with the string lives in and the students state:

    select Student || name || lives in || state

    from STUDENTS;

    returns the result:

    STUDENT||NAME||LIVESIN||STATE

    -------------------------------------

    Student Edwards P. David lives in MA

    Student Grogan A. Mary lives in MA

    Student Mixon Leatha lives in MA

    Student McLane Sandy lives in MA

    Student Novak Roland lives in NH

    Student Pierce Richard lives in MA

    Student Prior Lorraine lives in MA

    Student Rawlings Jerry lives in MA

    Student Lewis Jerry lives in RI

    In Microsoft SQL server concatenation is obtained using the + operator.Example 5.6.12 The query shown in Example 5.6.11 can be executed in Mi-crosoft SQL server by

    select Student + name + lives in + state

    from STUDENTS;

    5.7 Union, Intersection, and Difference in SQL

    Recall that union, intersection, and difference as defined in relational algebramay occur only between tables that have identical headings. To execute theseoperations in SQL, we need to use compound select phrases. Compound selectsare constructed from simple select phrases using the reserved words union,intersect, and minus. As we shall see, SQL treats union, intersection and dif-ference as operations between sets of tuples, and therefore, it removes duplicatevalues from the results of the queries.

  • 5.7 Union, Intersection, and Difference in SQL 85

    Example 5.7.1 To determine the student numbers of students who took cs210we write:

    select stno from GRADES

    where cno = cs210;

    This returns the result:

    STNO

    ----

    1011

    2661

    3566

    5571

    4022

    Similarly, we find the student numbers of students who took cs240:

    select stno from GRADES

    where cno = cs240;

    In turn, this yields:

    STNO

    ----

    3566

    5571

    2415

    5544

    1011

    4022

    To find the students who took both cs210 and cs240 we use the intersectto link the two previous select phrases into a compound select:

    select stno from grades where cno = cs210

    intersect

    select stno from grades where cno = cs240;

    This gives:

    STNO

    ----

    1011

    3566

    4022

    5571

    Neither SQL Server nor MySQL aupport the intersect operation.The union of the two sets is computed by the following compound select:

    select stno from grades where cno = cs210

    union

    select stno from grades where cno = cs240;

    Note that the tuples of the result are sorted.

  • 86 SQL The Relational Language

    STNO

    ----

    1011

    2415

    2661

    3566

    4022

    5544

    5571

    If we wish to retain all values in the result, then we need to use union allto link the select phrases as in:

    select stno from grades where cno = cs210

    union all

    select stno from grades where cno = cs240;

    The result contain now all values retrieved by the individual selects:

    STNO

    ----

    1011

    2661

    3566

    5571

    4022

    3566

    5571

    2415

    5544

    1011

    4022

    The set difference is computed in ORACLEs SQLPlus using minus. To findthe students who took cs210 but did not take cs240 we write:

    select stno from grades where cno = cs210

    minus

    select stno from grades where cno = cs240;

    which returns the result:

    STNO

    ----

    2661

    The reverse difference allows us to find students who took cs240 but did nottake cs210:

    select stno from grades where cno = cs240

    minus

    select stno from grades where cno = cs210;

    Now we obtain:

  • 5.8 Table Product in SQL 87

    STNO

    ----

    2415

    5544

    Neither SQL Server nor MySQL support the minus operation.

    5.8 Table Product in SQL

    A select phrase that lists several distinct table names after the reserved wordfrom computes the product of these tables.Example 5.8.1 To examine all possible pairs of students/instructors we couldwrite the following select:

    select STUDENTS.name, INSTRUCTORS.name

    from STUDENTS, INSTRUCTORS;

    Since our database is in a state that contains 9 students and five instructors,this will result in 45 rows retrieved:

    NAME NAME

    ---------------------------------

    Edwards P. David Evans Robert

    Grogan A. Mary Evans Robert

    Mixon Leatha Evans Robert

    .

    .

    .

    Pierce Richard Will Samuel

    Prior Lorraine Will Samuel

    Rawlings Jerry Will Samuel

    Lewis Jerry Will Samuel

    Observe that the tables are not linked by any where condition; as expectedin the definition of the product, all combinations of rows are considered. Af-ter computing the product, a projection eliminates all attributes except STU-DENTS.name and INSTRUCTORS.name.

    Also, note that we use qualified attributes as required by the definition oftable product (see Definition 4.1.7).

    The result produced by the query shown in Example 5.8.1 does not differ-entiate between the attributes STUDENTS.name and INSTRUCTORS.name andthis may confuse the user. Therefore, it is preferable to rename the columns ofthe result using the option as:

    select STUDENTS.name as stname, INSTRUCTORS.name as instname

    from STUDENTS, INSTRUCTORS;

    This will generate:

  • 88 SQL The Relational Language

    STNAME INSTNAME

    ---------------------------------

    Edwards P. David Evans Robert

    Grogan A. Mary Evans Robert

    Mixon Leatha Evans Robert

    .

    .

    .

    Pierce Richard Will Samuel

    Prior Lorraine Will Samuel

    Rawlings Jerry Will Samuel

    Lewis Jerry Will Samuel

    SQL allows for computations of products of several copies of the same tablethrough the creation of aliases; the solution proceeds using the logic discussedin Example 4.1.18. To create an alias S of a table named T we write the nameof the alias after the name of the table in the list of table, making sure that atleast one space (and no comma) exists between the name of the table and itsalias. For example, in the select phrase of Example 5.8.2 we create the alias Iby writing

    INSTRUCTORS I

    Table aliases are also known as correlation names of tables.

    Example 5.8.2 Let us solve the query shown in Example 4.1.18: finding allpairs of instructors names for instructors who share the same office. This canbe done by writing:

    select I.name as firstname, INSTRUCTORS.name as secname

    from INSTRUCTORS I, INSTRUCTORS

    where I.roomno = INSTRUCTORS.roomno and

    I.empno < INSTRUCTORS.empno;

    The result of this query is:

    FIRSTNAME SECNAME

    ------------------------------

    Exxon George Will Samuel

    Conceptually, we create an alias I of the table INSTRUCTORS, compute theproduct between this alias and INSTRUCTORS and retain those pairs that sharethe the same room and consist of distinct individuals.

    Example 5.8.3 Suppose that we need to find all triples of student names forstudents who live in the same city and state. Now we need to operate with threedistinct copies of the table STUDENTS. This is accomplished by:

    select S1.name as name1, S2.name as name2,

    S3.name as name3

    from STUDENTS S1, STUDENTS S2,

    STUDENTS S3

    where S1.state = S2.state and

    S2.state = S3.state and

  • 5.9 Join in SQL 89

    S1.city = S2.city and

    S2.city = S3.city and

    S1.stno < S2.stno and

    S2.stno < S3.stno

    which gives the result:

    NAME1 NAME2 NAME3

    ----------------------------------------------------

    McLane Sandy Prior Lorraine Rawlings Jerry

    5.9 Join in SQL

    Earlier version of SQL (at the level of SQL 1) dealt with the join operationindirectly, using operations like product, selection and projection, which arealready available in SQL. The blueprint of this treatment of the join operationwas outlined in Section 4.2.

    Example 5.9.1 The SQL solution to the query considered in Example 4.2.2 inwhich we seek to find the names of instructors who have taught any four-creditcourse is solved in SQL by writing:

    select distinct INSTRUCTORS.name

    from COURSES, GRADES, INSTRUCTORS

    where COURSES.cr = 4

    and COURSES.cno = GRADES.cno

    and GRADES.empno = INSTRUCTORS.empno;

    The steps that we applied in relational algebra can be easily reconstituted inSQL. The first step that consists of computing the product

    T1 = COURSES GRADES INSTRUCTORS

    corresponds to the list of tables that follows the word from. Then, the selectionspecified by

    T2 = (T1whereCOURSES.cr = 4 and

    COURSES.cno = GRADES.cnoand

    GRADES.empno = INSTRUCTORS.empno)

    is executed using the condition of the where clause.Finally, the projection

    T3(name) = T2[INSTRUCTORS.name]

    corresponds to the list that follows select. In this case, this list consists of oneattribute, INSTRUCTORS.name.

    We give one more example that shows a typical query that uses a join.

  • 90 SQL The Relational Language

    Example 5.9.2 To list all pairs of student names and course names such thatthe student takes the course, the relational algebra solution would require thatwe join the tables STUDENTS, GRADES, and COURSES. In SQL we write:

    select distinct STUDENTS.name, COURSES.cname

    from STUDENTS, GRADES, COURSES

    where STUDENTS.stno = GRADES.stno and

    GRADES.cno = COURSES.cno

    This query will return:

    NAME CNAME

    --------------------------------------------------

    Edwards P. David Computer Architecture

    Edwards P. David Computer Programming

    Edwards P. David Introduction to Computing

    Grogan A. Mary Computer Architecture

    .

    .

    .

    Prior Lorraine Data Structures

    Prior Lorraine Introduction to Computing

    Rawlings Jerry Computer Architecture

    Rawlings Jerry Introduction to Computing

    SQL dialects that conform to the SQL-2 standard (e.g., SQLPlus of Oracle9i and 10g, and Microsoft SQL Server) allow the use of the constructions in-ner join and on. For example, the query discussed in Example 5.9.1 has thealternate solution:

    select distinct INSTRUCTORS.name

    from INSTRUCTORS, COURSES INNER JOIN GRADES

    on COURSES.cno = GRADES.cno

    where INSTRUCTORS.empno = GRADES.empno

    and COURSES.cr = 4;

    This query should be viewed as computing the natural join of COURSES andGRADES based on the equality of the attributes they share (as specified by theon clause. Then, the join INSTRUCTORS with the result of the previous join iscomputed using the simulation by product and selection method.

    In SQL Plus queries involving natural joins among tables who attributesidentically named can be further simplified by applying the using clause, whichlists the attributes involved in the joining.Example 5.9.3 To retrieve the names of instructors who taught cs110 we canexecute in SQL Plus the query:

    select distinct INSTRUCTORS.name

    from INSTRUCTORS inner join GRADES

    using(empno);

    The inner join can be used for joins that involve more than two tables.

  • 5.9 Join in SQL 91

    Example 5.9.4 An alternative solution to the query of Example 5.9.1 thatmakes use of the inner join operation is:

    select distinct INSTRUCTORS.name

    from

    INSTRUCTORS inner join GRADES

    using(empno)

    inner join COURSES

    using(cno)

    where COURSES.cr = 4

    It is possible to involve several attributes in an inner join either explicitely,using the claues on or implicitely, employing the clause using.

    Example 5.9.5 To find the pairs of names of students and instructors such thatthe student takes a course with the instructor who is also his or her advisor, wecan write either:

    select distinct STUDENTS.name as sname, INSTRUCTORS.name as iname

    from GRADES inner join ADVISING

    on GRADES.stno = ADVISING.stno and

    GRADES.empno = ADVISING.empno

    inner join STUDENTS

    on ADVISING.stno = STUDENTS.stno

    inner join INSTRUCTORS

    on ADVISING.empno = INSTRUCTORS.empno

    or, equivalently,

    select distinct STUDENTS.name as sname, INSTRUCTORS.name as iname

    from GRADES inner join ADVISING

    using(stno,empno)

    inner join STUDENTS

    using(stno)

    inner join INSTRUCTORS

    using(empno)

    Cartesian product of two tables can be computed, alternatively using thecross join operation.Example 5.9.6 The query that we wrote in Example 5.8.1 that generates allpossible pairs of students/instructors can be also written as:

    select STUDENTS.name, INSTRUCTORS.name

    from STUDENTS cross join INSTRUCTORS;

    which is equivalent to

    select STUDENTS.name, INSTRUCTORS.name

    from STUDENTS, INSTRUCTORS;

  • 92 SQL The Relational Language

    We saw that when joining two tables not all tuples are joinable; tuples thatbelong to one table and are not joinable with any tuple of the other table leave notrace in the join, a situation that is often inconvenient. As we saw in Section 4.3,the outer join operation and its variants, the left outer join and the right outerjoin can rectify this situation.

    Let us assume that the tabular variables STUDENTS and INSTRUCTORScontain the tuples shown in Figure 5.1.

    The tabular variable ADVISING has the same content as the one shown inFigure 3.1.

    Example 5.9.7 Oracles own syntax for left outer join is to designate the com-ponent that may be null by (+), as in

    select students.name, ADVISING.empno from STUDENTS, ADVISING

    where STUDENTS.stno = ADVISING.stno(+)

    This is equivalent to using the operator left outer join as specified by SQL2:

    select STUDENTS.name, ADVISING.empno

    from STUDENTS left outer join ADVISING

    on STUDENTS.stno = ADVISING.stno

    \end{PGMdiplsy}

    Either phrase will return:

    \begin{PGMdisplay}

    name empno

    -----------------------------------------

    Edwards P. David 019

    Grogan A. Mary 019

    Mixon Leatha 023

    McLane Sandy 023

    Novak Roland 056

    Pierce Richard 126

    Prior Lorraine 234

    Rawlings Jerry 023

    Lewis Jerry 234

    Davis Richard

    Chu Martin

    The computation of the right outer join is similar. We can use either Oraclessyntax as in

    select ADVISING.stno, INSTRUCTORS.name from ADVISING, INSTRUCTORS

    where ADVISING.empno(+) = INSTRUCTORS.empno;

    or the standard syntax:

    select ADVISING.stno, INSTRUCTORS.name

    from ADVISING right outer join INSTRUCTORS

    on ADVISING.empno = INSTRUCTORS.empno;

    In either case we shall obtain:

  • 5.9 Join in SQL 93

    STUDENTS

    stno name addr city state zip

    1011 Edwards P. David 10 Red Rd. Newton MA 02159

    2415 Grogan A. Mary 8 Walnut St. Malden MA 02148

    2661 Mixon Leatha 100 School St. Brookline MA 02146

    2890 McLane Sandy 30 Cass Rd. Boston MA 02122

    3442 Novak Roland 42 Beacon St. Nashua NH 03060

    3566 Pierce Richard 70 Park St. Brookline MA 02146

    4022 Prior Lorraine 8 Beacon St. Boston MA 02125

    5544 Rawlings Jerry 15 Pleasant Dr. Boston MA 02115

    5571 Lewis Jerry 1 Main Rd Providence RI 02904

    6410 Davis Richard 45 Algonquin Rd. Natick MA 01760

    7209 Chu Martin 90 Rye Dr. Ayer MA 01290

    INSTRUCTORS

    empno name rank roomno telno

    019 Evans Robert Professor 82 7122

    023 Exxon George Professor 90 9101

    056 Sawyer Kathy Assoc. Prof. 91 5110

    126 Davis William Assoc. Prof. 72 5411

    234 Will Samuel Assist.Prof. 90 7024

    323 Campbell Kenneth Professor 102 7077

    Figure 5.1: Tables with tuples with null components

  • 94 SQL The Relational Language

    stno name

    ---------------------------

    1011 Evans Robert

    2415 Evans Robert

    2661 Exxon George

    2890 Exxon George

    5544 Exxon George

    3442 Sawyer Kathy

    3566 Davis William

    4022 Will Samuel

    5571 Will Samuel

    Campbell Kenneth

    Finally, the outer join itself can be computed using the operator outer join:

    select STUDENTS.name, INSTRUCTORS.name

    from students full outer join advising

    using(stno)

    full outer join instructors

    using(empno);

    This will result in

    sname iname

    -----------------------------------------------------

    Grogan A. Mary Evans Robert

    Edwards P. David Evans Robert

    Rawlings Jerry Exxon George

    McLane Sandy Exxon George

    Mixon Leatha Exxon George

    Novak Roland Sawyer Kathy

    Pierce Richard Davis William

    Lewis Jerry Will Samuel

    Prior Lorraine Will Samuel

    Chu Martin

    Davis Richard

    Campbell Kenneth

    5.10 Sets and subqueries

    Subqueries are select phrases that return sets rather than tables. Their mainuse is in conditions that involve sets. As we shall see, they are useful in imple-menting difference and division

    in SQL. Syntactically, a subquery is written by placing a select phrasebetween a pair of parentheses. For example,

    (select empno from INSTRUCTORS where rank = Professor);

  • 5.10 Sets and subqueries 95

    is a subquery that computes the employee numbers of full professors. To findthe student numbers of students who take a course with a full professor, weneed to select those GRADES tuples whose empno belongs to this set. This canbe accomplished by writing:

    select distinct stno from GRADES where

    empno in (select empno from INSTRUCTORS

    where rank = Professor);

    This will return the result:

    STNO

    ----

    1011

    2415

    2661

    3566

    4022

    5544

    5571

    We refer to the first select as the calling select, or the main select or the outerselect; the select of the subquery is the inner select.

    As we saw in the introductory example, membership can be tested using in.Here is another example.Example 5.10.1 Let us find the names of students who took cs310. We de-termine the student numbers of those students using a subquery. Then, in themain select, we retrieve those students whose student number is in this set.This can be accomplished using the query:

    select name from STUDENTS where

    stno in (select stno from GRADES

    where cno = cs310);

    which returns the table:

    NAME

    --------------

    Mixon Leatha

    Prior Lorraine

    It is possible to test membership of a tuple in a set of tuples computed by asubquery using a condition of the form

    (x1, . . . , xn) in (select A1, . . . , An from )

    This type of test is included by SQL99, but it is not implemented in many SQLdialects. However, it is in ORACLE and DB2.

    Example 5.10.2 To find the pairs of names of students and instructors suchthat the student took some course with the instructor but no four-credit course.This is computed by the following query:

  • 96 SQL The Relational Language

    select STUDENTS.name as sname,

    INSTRUCTORS.name as iname

    from STUDENTS, INSTRUCTORS where

    (STUDENTS.stno, INSTRUCTORS.empno) in

    (select stno, empno from grades

    minus

    select stno, empno from grades

    where cno in (select cno

    from courses

    where cr=4));

    This will return the following table:

    SNAME INAME

    ------------------ -------------

    Edwards P. David Sawyer Kathy

    Grogan A. Mary Evans Robert

    Mixon Leatha Will Samuel

    Novak Roland Will Samuel

    Prior Lorraine Sawyer Kathy

    Prior Lorraine Will Samuel

    Rawlings Jerry Sawyer Kathy

    Lewis Jerry Will Samuel

    If oper is one of the operators =, !=, , =, then we can useconditions of the form

    v oper any (select ...)

    or

    v oper all (select ...)

    in comparisons that involve some elements of the set computed by the subquery(select ) or all elements of the same set, respectively. Here != stands forinequality.

    Example 5.10.3 To find the names of the courses taken by the student whosestudent number is 1011, we can use the following query:

    select cname from COURSES where

    cno = any (select cno from GRADES where stno= 1011);

    The construct = any is synonymous with in, and the same query could bewritten as:

    select cname from COURSES

    where cno in (select cno from GRADES where stno= 1011);

    Also, instead of = any we could use = some, and so, we have a third way orwriting the same query:

    select cname from COURSES where

    cno = some (select cno from GRADES where stno= 1011);

  • 5.11 Parametrized subqueries 97

    All three queries result in the table:

    CNAME

    -------------------------

    Introduction to Computing

    Computer Programming

    Computer Architecture

    Example 5.10.4 Let us find the students who obtained the highest grade incs110. Although there are methods that we explain later that yield much simplersolutions for this type of query, for the moment we want to illustrate the oper allcondition. We operate on two copies of GRADES. The copy used in the innerselect is intended for computing the grades obtained in cs110:

    select stno from GRADES where cno = cs110

    and grade >= all(select grade from GRADES

    where cno = cs110);

    We obtain the table:

    STNO

    ----

    5544

    Example 5.10.5 Let us find the students who obtained a grade higher than anygrade given by a certain instructor, say Prof. Will. Using the all... subquerywe can write:

    select stno from GRADES

    where grade >= all(select grade from GRADES

    where empno in (select empno from INSTRUCTORS

    where name like Will%));

    If we alter this query and replace the instructor with Prof. Davis, who teachesno courses, then the set computed by the query

    select stno from GRADES

    where grade >= all(select grade from GRADES

    where empno in (select empno from INSTRUCTORS

    where name like Davis%));

    is empty. Therefore, every grade satisfies the inequality, and we obtain allstudent numers for students who took any course!

    5.11 Parametrized subqueries

    Often the retrieval performed in a subquery depends on a value provided by thecalling select. A typical situation is described in the following example.

  • 98 SQL The Relational Language

    Example 5.11.1 Suppose that we need to retrieve the course numbers of coursestaken by the student whose student number is STUDENTS.stno. Ignore (for themoment) the origin of this piece of data. Then, the retrieval is done by theselect construct:

    select cno from GRADES

    where stno = STUDENTS.stno;

    Next, we transform this select into a subquery. The student number STU-DENTS.stno is provided by the outer select of the following construct:

    select name from STUDENTS where cs310 in

    (select cno from GRADES

    where stno = STUDENTS.stno);

    Observe that this provides an alternate solution to the query discussed in Ex-ample 5.10.1. Namely, we use a subquery to compute the courses taken by eachstudent. Then, we test if cs310 is one of these courses. We use the qualified at-tribute STUDENTS.stno inside the subquery to differentiate between this inputparameter and the attribute stno of the table GRADES.

    Sets of tuples produced by subqueries can be tested for emptiness using theexists condition. Namely, the condition

    exists (select from )

    is true if the set returned by the subquery is not empty; similarly,

    not exists (select from )

    is true if the set returned by the subquery is empty.

    Example 5.11.2 Let us give yet another solution to the query we solved inExample 5.10.1. This time, to find the names of students who took cs310 wedetermine the student numbers of those students for whom their set of gradesin cs310 is not empty. This can be done as follows:

    select name from STUDENTS where

    exists (select * from GRADES where

    stno = STUDENTS.stno and

    cno = cs310);

    As a result, we have the table:

    NAME

    --------------

    Mixon Leatha

    Prior Lorraine

    Example 5.11.3 To find instructors who never taught cs110, we search forinstructors for whom there is no GRADES record involving cs310 and theseinstructors. This can be done by

  • 5.11 Parametrized subqueries 99

    select name from INSTRUCTORS where

    not exists(select * from GRADES where

    empno = INSTRUCTORS.empno and

    cno = cs110);

    which results in the table:

    NAME

    -------------

    Sawyer Kathy

    Davis William

    Will Samuel

    If both the main query and the subquery deal with the same table and thesubquery requires input parameters from the outer query, then we use an aliasof the table in the outer query.

    Example 5.11.4 Let us find the student numbers of students whose advisoris advising at least one other student. The information is contained in theADVISING table, and the following select construct uses both ADVISING (inthe subquery) and its alias A in the main query:

    select distinct stno from ADVISING A

    where exists (select * from ADVISING where

    empno = A.empno and stno != A.stno);

    This query returns the table:

    STNO

    ----

    1011

    2415

    2661

    2890

    4022

    5544

    5571

    Subqueries can be used in the list that follows from in exactly the samemanner that tables are used. This is shown in the next example:Example 5.11.5 To find the pairs of names of students and instructors suchthat the student took some course with the instructor we could write:

    select STUDENTS.name as sname, INSTRUCTORS.name as iname

    from STUDENTS, INSTRUCTORS,

    (select stno, empno from GRADES) PN

    where STUDENTS.stno = PN.stno and

    INSTRUCTORS.empno = PN.empno;

  • 100 SQL The Relational Language

    The difference of the tables T and S can be computed by looking for eachtuple of T for which there is no matching tuple in S. This can be done by:select * from T where

    not exists (select * from S whereA1 = T.A1 and and An = T.An)

    Example 5.11.6 Courses offered by the continuing education program but notby the regular program can be found by writing:

    select * from CED_COURSES where

    not exists (select * from COURSES where

    cno = CED_COURSES.cno)

    which takes advantage of the fact that cno is a key for both COURSES andCED COURSES.

    5.12 Subqueries and division

    SQL does not have a division operation. However, as we saw in Examples 4.1.27and 4.2.3, we can perform division using product, projection, and difference. Ofcourse, we could apply the prescription offered by relational algebra. This typeof solution is discussed in the next example.Example 5.12.1 The solution envisioned here is

    select cno from grades

    minus

    select GI.cno from (select grades.cno,

    instructors.empno

    from grades, instructors

    where rank=Professor) GI

    where (GI.cno,GI.empno) not in (select cno,empno from grades)

    Note that the query

    select grades.cno, instructors.empno

    from grades, instructors

    where rank=Professor

    computes all pairs of courses and instructor numbers using the product of thetables GRADES and INSTRUCTORS. Then, the query

    select GI.cno from (select grades.cno,

    instructors.empno

    from grades, instructors

    where rank=Professor) GI

    where (GI.cno,GI.empno) not in (select cno, empno from grades)

    extracts the courses that are part of the pairs of the previous table that do notappear in the GRADES table, that is, the courses for which there exists a fullprofessor who did not teach these courses. These are the courses that we needto exclude from the answer. Thus, the query presented at the beginning of thisexample yields the solution of the problem:

  • 5.12 Subqueries and division 101

    CNO

    -----

    cs110

    The solution presented in Example 5.12.1 is not applicable in SQL dialectsthat do not have all the facilities of SQL Plus. Therefore, we need to examinean alternate way of solving this problem that is almost universally usable. Tounderstand the technique used we examine the solution of the query formulatedin the next example.

    Example 5.12.2 Again, suppose that we need to determine the courses taughtby every full professor. Let us formulate the same query in a way that iseasier to translate in SQL. Namely, we find the courses for which there are nofull professors who have not taught these courses. The reader should realizeimmediately that this is simply a new formulation of the same problem. Weshow the solution in steps, moving gradually from plain English to SQL:

    Phase I:

    select cno from GRADES G where

    not exists (instructors who are full professors andhave not taught the course G.cno)

    Phase II:

    select cno from GRADES G where

    not exists (select * from INSTRUCTORS

    where rank = Professor and

    these instructors have not taught

    the course G.cno)

    Phase III:

    select cno from GRADES G where

    not exists (select * from INSTRUCTORS

    where rank = Professor and

    not exists (select * from GRADES

    where empno = INSTRUCTORS.empno

    and cno = G.cno));

    In Phase I we determine in SQL the course numbers for which no full pro-fessor exists who has not taught these courses.

    In Phase II we concentrate on preventing the existence of full professors whoare not teaching these courses. Note that Phase II still contains an untranslatedpart.

    Finally, in Phase III, we translate the part who have not taught thesecourses using not exists for the second time.

    Example 5.12.3 Another query that requires division in relational algebra is:Find names of instructors who have taught every 100-level course, that is,

  • 102 SQL The Relational Language

    every course whose first digit of the course number is 1. The formulation thatis better suited to SQL implementation is: Find names of instructors for whomthere is no 100 level course that they have not taught. This is solved by thefollowing select construct:

    select name from INSTRUCTORS where

    not exists (select * from COURSES

    where cno like cs1__ and

    not exists (select * from GRADES where

    empno = INSTRUCTORS.empno

    and cno = COURSES.cno));

    The answer that results from our usual database instance is:

    NAME

    ------------

    Evans Robert

    Exxon George

    5.13 Relational Completeness of SQL

    Between Chapter 4 and the current chapter, we have shown that SQL is capableof performing all operations of relational algebra. This fact is known as therelational completeness of SQL. As we shall see in subsequent chapters, thecapabilities of SQL go well beyond the standard definition of relational algebra.

    5.14 Scalar Functions of SQL

    We present now capabilities of SQL that go beyond relational algebra. We beginby discussing built-in functions in SQL that may act on individual values (scalarfunction), functions that act on sets of values (aggregate functions), and, also,analytic functions that can be used for various statistical computations. Then,we continue with the group by option of select, and we discuss several on-lineanalytic processing functions of SQL.

    Scalar functions are built-in functions of SQL that work on individual values.They are highly dependent on the particular implementation of SQL, and welimit our discussions to functions implemented by ORACLEs SQL Plus. Thereare several types of scalar functions, depending on the types of their arguments.

    5.14.1 Numerical Functions

    Among the numerical functions, abs, sin, cos, power, sqrt, etc. have quite obviousdefinitions. For example, sqrt computes the square root of its argument, whilepower(x, y) computes xy.

  • 5.14 Scalar Functions of SQL 103

    Example 5.14.1 To illustrate some of the numerical functions we create atable POINTS whose rows represent labelled points in the plane:

    create table POINTS(ptid varchar2(10), x integer, y integer,

    primary key(ptid));

    and populate this table using the commands:

    insert into points(ptid, x, y) values (a,0,0);

    insert into points(ptid, x, y) values (b,0,1);

    insert into points(ptid, x, y) values (c,0,2);

    insert into points(ptid, x, y) values (d,1,0);

    insert into points(ptid, x, y) values (e,1,1);

    insert into points(ptid, x, y) values (f,1,2);

    insert into points(ptid, x, y) values (g,2,0);

    insert into points(ptid, x, y) values (h,2,1);

    insert into points(ptid, x, y) values (i,2,2);

    insert into points(ptid, x, y) values (j,3,0);

    insert into points(ptid, x, y) values (k,3,1);

    insert into points(ptid, x, y) values (l,3,2);

    To determine the distances from a to every other point we write

    select p.ptid,

    sqrt(power(a.x - p.x,2)+power(a.y - p.y,2))

    as dist

    from points a, points p

    where a.ptid = a

    This returns:

    PTID DIST

    ---------- ----------

    a 0

    b 1

    c 2

    d 1

    e 1.41421356

    f 2.23606798

    g 2

    h 2.23606798

    i 2.82842712

    j 3

    k 3.16227766

    l 3.60555128

    To compute the distance between a having the coordinates (xa, ya) and a pointp with coordinates (xp, yp), we use the formula d(a, p) =

    (xa xp)2 + (ya yp)2.

    The formula appears in the target list of the select and is written with the nu-merical functions sqrt and power.

    In Oracle we can perform computations unrelated to any table by using afictious tabular variable that is named DUAL.

  • 104 SQL The Relational Language

    Example 5.14.2 To compute sin(30), sin(45) and sin(60) in Oracle, wewrite:

    select sin(30*3.14159265359/180) as sin30,

    sin(45*3.14159265359/180) as sin45,

    sin(60*3.14159265359/180) as sin60

    from dual;

    We need to convert the angles to radians before sin is applied. This will return:

    SIN30 SIN45 SIN60

    ---------- ---------- ----------

    .5 .707106781 .866025404

    Microsoft SQL server has a simpler way of performing this type of compu-tations in that it does not require the fictitious table.Example 5.14.3 In SQL server we can simply write:

    select sin(30*3.14159265359/180) as sin30,

    sin(45*3.14159265359/180) as sin45,

    sin(60*3.14159265359/180) as sin60;

    to obtain the same result as the one obtained in ORACLE.

    5.14.2 String Functions

    String functions can be used to transform strings, extract parts of strings, trans-form strings, etc.

    The functions upper and lower, convert strings to upper and lower charac-ters, respectively.Example 5.14.4 To print names of students in capital characters and coursetitles in small letters we can write:

    select distinct upper(STUDENTS.name) as STNAME,

    lower(COURSES.cname) as course

    from STUDENTS, GRADES, COURSES

    where STUDENTS.stno = GRADES.stno and

    GRADES.cno = COURSES.cno;

    This generates the following return:

    STNAME COURSE

    -----------------------------------------------

    EDWARDS P. DAVID computer architecture

    EDWARDS P. DAVID computer programming

    EDWARDS P. DAVID introduction to computing

    GROGAN A. MARY computer architecture

    .

    .

    .

  • 5.14 Scalar Functions of SQL 105

    PRIOR LORRAINE data structures

    PRIOR LORRAINE introduction to computing

    RAWLINGS JERRY computer architecture

    RAWLINGS JERRY introduction to computing

    These functions are particularly useful for performing string comparisons whenignoring case. Thus,

    STE\% like upper(stephany)

    is true.

    Example 5.14.5 The string function replace substitutes every occurrence ofits second argument in the value(s) specified by its first argument, by its thirdargument. In the select written below the string Computer is replaced by thestring Comp.:

    select replace(cname,Computer,Comp.) from COURSES;

    This yields the following result:

    REPLACE(CNAME,COMPUTER,COMP.)

    ----------------------------------

    Introduction to Computing

    Comp. Programming

    Comp. Architecture

    Data Structures

    Higher Level Languages

    Software Engineering

    Graphics

    Example 5.14.6 The function concat computes the concatenation of two stringsthat form its arguments. Its effect is identical to the concatenation operator ||that we discussed in Example 5.6.11. The phrase below prints the state and zipcode of each students as a single string:

    select name, addr, concat(state,zip) as state_zip from STUDENTS;

    This returns:

    NAME ADDR STATE_ZIP

    ----------------------------------------------

    Edwards P. David 10 Red Rd. MA02159

    Grogan A. Mary Walnut St. MA02148

    Mixon Leatha 100 School St. MA02146

    McLane Sandy 30 Cass Rd. MA02122

    Novak Roland 42 Beacon St. NH03060

    Pierce Richard 70 Park St. MA02146

    Prior Lorraine 8 Beacon St. MA02125

    Rawlings Jerry 15 Pleasant Dr. MA02115

    Lewis Jerry 1 Main Rd RI02904

  • 106 SQL The Relational Language

    Example 5.14.7 To extract substrings of strings is we can use the functionsubstr. To call this function we need to use the following syntax:

    substr(string, integer [,integer ])A typical call such as substr(s, n, m) will return a the substring of length mof the string s that starts with the nth characater of s. If m is omitted, asin substr(s, n), then the function returns all charaters of s starting from thenth character to the end of s. If n is negative, then the characters are countedbackwards from the end of s.

    The select phrase

    select substr(Oracle,2,3) from dual;

    will return:

    SUB

    ---

    rac

    The next select which omits the third argument of substr:

    select substr(Oracle,2) from dual

    yields:

    SUBST

    -----

    racle

    which is the string that begins with the second character of Oracle and endswith the last character of this string.

    Since the second argument of the function call in

    select substr(Oracle,-4,3) from dual

    is negative, the starting position of the substring is the 4th character countedfrom the end (that is, the character a) and thus, the query returns:

    SUB

    ---

    acl

    The functions lpad and rpad can be used to enhance presentation of resultsof queries. The syntax of lpad is:

    lpad(s, integer [string])The effect is to padd s to the left with spaces to bring the total length of thestring to the length specified by the second argument of the function. If thethird argument is present, then this string is repeated to the left to fill up thepadded string.

    The function rpad has a similar syntax; however, the padding is done at theright of s.

    Example 5.14.8 To print a list of all employees and their salaries (using thetabular variables EMPHIST and PERSINFO we can use the query:

  • 5.14 Scalar Functions of SQL 107

    select name, lpad(salary,7,$) as ann_salary from

    persinfo, emphist

    where persinfo.empno = emphist.empno

    This will return the result:

    NAME ANN_SAL

    ----------------------------------- -------

    Natalia Martins $150000

    Laura Schwartz $120000

    John Soriano $120000

    Kendall MacRae $100000

    Rachel Anderson $$70000

    Richard Laughlin $$70000

    Danielle Craig $$90000

    Abby Walsh $$75000

    Bailey Burns $$70000

    5.14.3 Date functions

    SQL Plus contains a class of functions that apply to the DATE type: extract,months between, etc.Example 5.14.9 The function extract computes a part of a date value. Itsfirst argument gives the desired date part; the second argument is the datevalue. For instance, to obtain the year part of the appt date attribute of thetable EMPHIST we write:

    select empno, extract(year from appt_date) as start_y

    from emphist;

    This returns:

    EMPNO START_Y

    ---------- ----------

    1000 1999

    1005 1999

    1010 2000

    1015 1999

    1020 1999

    1025 2000

    1030 2000

    1035 2000

    1040 2000

    Similarly, we can obtain the month part of a date by writing

    select empno, extract(month from appt_date)

    as start_m

    from emphist

    This will return the result:

  • 108 SQL The Relational Language

    EMPNO START_M

    ---------- ----------

    1000 10

    1005 10

    1010 1

    1015 10

    1020 11

    1025 3

    1030 1

    1035 2

    1040 3

    Example 5.14.10 To compute the number of months an employee has workedwe can use the function month between. This will compute the number ofmonths between the current date (designated by the system-provided constantSYSDATE) and the date of hire:

    select empno, months_between(SYSDATE,appt_date)

    as month_served

    from emphist

    The table returned by this query is:

    EMPNO MONTH_SERVED

    ---------- ------------

    1000 35.8877397

    1005 35.532901

    1010 32.8877397

    1015 35.1135461

    1020 34.8877397

    1025 30.5974171

    1030 32.5974171

    1035 31.2748365

    1040 30.8877397

    Arithmetic computations can be performed in the target list of any select.

    Example 5.14.11 Suppose that a bonus is to be paid to the employees. Thebonus is computed by paying 10% of the current weekly salary (salary/52)(determined by a null value of the termination date), multiplied by the numberof months employed. This is computed by

    select empno, 0.1 * months_between(SYSDATE,appt_date) * salary/52 as bonus

    from emphist

    where term_date is null;

    This query returns:

  • 5.15 Aggregate Functions in SQL 109

    EMPNO BONUS

    ------------------

    1000 10430.7253

    1005 8262.69438

    1010 7652.27254

    1015 6804.93348

    1020 4733.05642

    1025 4155.51299

    1030 5688.95627

    1035 4550.04006

    1040 4194.59488

    5.15 Aggregate Functions in SQL

    Aggregate functions are those functions that operate on sets of values. Typicalexamples include: sum, avg, max, min, and count.

    The first four functions operate on columns of tables and ignore null values.The count returns the number of elements of the set that is its argument.

    Example 5.15.1 The following select construct determines the largest gradeobtained by the student whose student number is 1011. The function max isapplied to the set of grades of the student whose number is 1011 and returnsthe largest value in this set:

    select max(grade) as highgr from GRADES

    where stno = 1011;

    This returns the table:

    HIGHGR

    ------

    90

    For instance, sum(A) returns the sum of all values of the selected nonnullA-components of the tuples. Similarly, avg(A) returns the average value of thesame sequence. The expressions max(A) and min(A) yield the largest and thesmallest values in the set of A-components of the tuples selected by a query,respectively.

    The functions sum and avg apply to attributes whose domains are numerical(such as integer or float); max and min apply to every kind of attribute.

    If we wish to discard duplicate values from the sequences of values beforeapplying these functions, we need to use the word distinct. For instance,sum(distinct A) considers only the distinct nonnull values that occur in thesequence of components.

    Example 5.15.2 We mentioned that the built-in functions max and min applyto string domains as well as to numerical domains. We use this feature of thesefunctions to determine the first and the last student in alphabetical order:

  • 110 SQL The Relational Language

    select min(name) as first, max(name) as last

    from STUDENTS;

    This query yields the table:

    FIRST LAST

    ---------------- --------------

    Edwards P. David Rawlings Jerry

    Next, we show a select construct where the same functions are applied toa numerical domain:

    select min(grade) as lowgr,

    max(grade) as highgr from GRADES

    where stno = 1011;

    This generates the answer:

    LOWGR HIGHGR

    -----------------

    40 90

    The query

    select avg(distinct grade) as avggr from GRADES

    where stno = 1011

    returns the table

    AVGGR

    -----

    73.75

    If we discard duplicate values as in

    select avg(distinct grade) as avggr from GRADES

    where stno = 1011

    then the average grade is lower, indicating a preponderance of the higher gradesfor this student:

    AVGGR

    -----

    68.33

    Built-in functions can be used in subqueries. This is illustrated by the nextexample.Example 5.15.3 To retrieve the students who obtained a grade higher thanthe average grade in cs110 we write:

  • 5.15 Aggregate Functions in SQL 111

    select stno from grades where cno = cs110

    and grade > all(select avg(grade) from grades

    where cno=cs110);

    This returns the table:

    STNO

    ----

    2661

    3566

    5544

    The count function can be used in several ways: count(A) can be used to determine the number of non-null entries underthe attribute A;

    count(distinct A) computes the number of distinct non-null values thatoccur under A;

    count(*) determines how many rows exist in a table.Note that count(distinct *) cannot be used in SQL.

    Example 5.15.4 Here are several examples of the use of the count function.To find how many students took cs110 in the fall semester of 2002, we write:

    select count(cno) from GRADES

    where cno = cs110 and

    sem = Fall and

    year = 2003;

    Since no records exist for any grades given during that semester in cs110, weobtain the answer:

    COUNT(CNO)

    ----------

    0

    Observe that this table has a system-supplied column name COUNT(cno). Thishappens because we did not provide a name using as.

    Let us determine how many students have ever registered for any course. Wehave to retrieve this result from GRADES, and we must use distinct to avoidcounting the same student several times (if the student took several courses):

    select count(distinct stno) as nost

    from GRADES;

    This query returns the one-entry table:

    NOST

    ----

    8

  • 112 SQL The Relational Language

    Finally, let us determine the names of instructors who are teaching morethan one subject. For every instructor, we determine in a subquery the numberof courses taught. Then, we retain those instructors who taught more than onecourse:

    select name from INSTRUCTORS where

    1 < any (select count(distinct cno) from GRADES

    where empno = INSTRUCTORS.empno);

    We obtain the table:

    NAME

    ------------

    Evans Robert

    Will Samuel

    5.16 Sorting Results

    Data obtained from a select construct may be sorted on one or several columnsusing the order by clause. This clause also gives the user the possibility ofopting for an ascending or descending sorting order on each of the columns. Bydefault, the ascending order is chosen.

    Example 5.16.1 Suppose that we need to sort the GRADES tuples on thestudent number. For each student, we sort the grades in descending order. Thiscan be done with the query:

    select * from GRADES

    order by stno, grade desc;

    This results in the output shown next:

    STNO EMPNO CNO SEM YEAR GRADE

    ---------- ----------- ----- ------ ---------- ----------

    1011 019 cs210 FALL 2003 90

    1011 056 cs240 SPRING 2004 90

    1011 023 cs110 SPRING 2003 75

    1011 019 cs110 FALL 2002 40

    2415 019 cs240 SPRING 2003 100

    2661 234 cs310 SPRING 2004 100

    2661 019 cs110 FALL 2002 80

    2661 019 cs210 FALL 2003 70

    3442 234 cs410 SPRING 2003 60

    3566 019 cs240 SPRING 2003 100

    3566 019 cs110 FALL 2002 95

    3566 019 cs210 FALL 2003 90

    4022 056 cs240 SPRING 2004 80

    4022 234 cs310 SPRING 2004 75

  • 5.16 Sorting Results 113

    4022 019 cs210 SPRING 2004 70

    4022 023 cs110 SPRING 2003 60

    5544 019 cs110 FALL 2002 100

    5544 056 cs240 SPRING 2004 70

    5571 019 cs210 SPRING 2004 85

    5571 234 cs410 SPRING 2003 80

    5571 019 cs240 SPRING 2003 50

    Instead of using the name of the columns one could use their ordinal positionin the select phrase.Example 5.16.2 An equivalent form of the query from Example 5.16.1 is

    select stno, empno, cno, sem, year, grade

    from GRADES

    order by 1, 6 desc;

    Ordering of the results can also be achieved by using expressions.Example 5.16.3 To sort the grades based on the second digit of the coursenumber, and, then on the first digit of the course number (which are the fourthand the third characters of course numbers) we write:

    select * from grades

    order by substr(cno,4,1), substr(cno,3,1)

    This will return the following result:

    STNO EMPNO CNO SEM YEAR GRADE

    ---------- ----------- ----- ------ ---------- ----------

    1011 019 cs110 FALL 2002 40

    2661 019 cs110 FALL 2002 80

    3566 019 cs110 FALL 2002 95

    5544 019 cs110 FALL 2002 100

    1011 023 cs110 SPRING 2003 75

    4022 023 cs110 SPRING 2003 60

    1011 019 cs210 FALL 2003 90

    3566 019 cs210 FALL 2003 90

    4022 019 cs210 SPRING 2004 70

    5571 019 cs210 SPRING 2004 85

    2661 019 cs210 FALL 2003 70