MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business...

38
MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language Mar 12, 2013

Transcript of MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business...

Page 1: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 1Georgia State University - Confidential

MGS 4020

Business Intelligence

Relational Algebra and Structured Query Language

Mar 12, 2013

Page 2: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 2Georgia State University - Confidential

Agenda

JoinSQL Queries

Page 3: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 3Georgia State University - Confidential

Set Operations

Restriction

Union Intersection Difference

Projection

Binary

UNION

? ?

Unary

Page 4: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 4Georgia State University - Confidential

SQL Building Blocks

• CREATE, ALTER, DROP

• INSERT, DELETE, UPDATE

• SELECT

• UNION, INTERSECT, MINUS

• JOIN

• INDEX

• VIEWS

• Utilities (introduced throughout the examples).

• Transaction Management Features

• Additional Features

Page 5: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 5Georgia State University - Confidential

Horizontal Slices

• Restriction– Specifying Conditions

Unconditional

List all students

select *

fromSTUDENT;

(Student)

Conditional

List all student with GPA > 3.0

select *

from STUDENT

where GPA > 3.0;

GPA > 3.0 (Student)

Algebra: selection

or restriction (R)

Page 6: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 6Georgia State University - Confidential

Pattern Matching

‘%’ any string with n characters, n>=0‘_’ any single character. x exact sequence of string x.

List all CIS 3200 levelcourses.select * from COURSEwhere course# like ? ;

List all CIS courses.select * from COURSEwhere course# like CIS%’;

Page 7: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 7Georgia State University - Confidential

Specifying Conditions

List all students in ...select * from STUDENTwhere city in (‘Boston’,’Atlanta’);

List all students in ...select * from STUDENTwhere zip not between 60115 and 60123;

Page 8: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 8Georgia State University - Confidential

Vertical Slices

• Projection– Specifying Elements

No Specification

List all information about Students

select *

fromSTUDENT;

(Student)

Conditional

List IDs, names, and addresses of all students

select ID, name, address

from STUDENT;

ID, name, address (Student)

Algebra: projection

<A1,A2,...Am> (R)

Page 9: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 9Georgia State University - Confidential

Does SQL treat Relations as ‘Sets’

What are the different salaries we pay to our employees?

select salaryfrom EMPLOYEE;

OR is the following better?

select DISTINCT salaryfrom EMPLOYEE;

Is the following necessary?

Page 10: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 10Georgia State University - Confidential

Horizontal and Vertical

Query:

Lista all student ID, names and addresses who have

GPA > 3.0 and age >20.

select ID, Name, Address

from STUDENT

where GPA > 3.0 and DOB < ‘1-Jan-6’

order by Name DESC;

Algebra:

ID,name, address ( GPA > 3.0 and DOB < ‘1-Jan-74’ (STUDENT)

Order by sorts result in descending (DESC) order.

Note: The defauld order is ascending (ASC) as in:

order by Name;

Page 11: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 11Georgia State University - Confidential

Agenda

SQL JoinQueries

Page 12: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 12Georgia State University - Confidential

Relational Database

A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The relational database was invented by E. F. Codd at IBM in 1970.

The standard user and application program interface to a relational database is the structured query language (SQL). SQL statements are used both for interactive queries for information from a relational database and for gathering data for reports.

A relational database is a set of tables containing data fitted into predefined categories. Each table (which is sometimes called a relation) contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns. For example, a typical business order entry database would include a table that described a customer with columns for name, address, phone number, and so forth. Another table would describe an order: product, customer, date, sales price, and so forth. A user of the database could obtain a view of the database that fitted the user's needs. For example, a branch office manager might like a view or report on all customers that had bought products after a certain date. A financial services manager in the same company could, from the same tables, obtain a report on accounts that needed to be paid.

Page 13: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 13Georgia State University - Confidential

Relational Database

When creating a relational database, you can define the domain of possible values in a data column and further constraints that may apply to that data value. For example, a domain of possible customers could allow up to ten possible customer names but be constrained in one table to allowing only three of these customer names to be specifiable.

The definition of a relational database results in a table of metadata or formal descriptions of the tables, columns, domains, and constraints. Meta is a prefix that in most information technology usages means "an underlying definition or description." Thus, metadata is a definition or description of data and metalanguage is a definition or description of language.

A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.

SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database. Although SQL is both an ANSI and an ISO standard, many database products support SQL with proprietary extensions to the standard language. Queries take the form of a command language that lets you select, insert, update, find out the location of data, and so forth.

Page 14: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 14Georgia State University - Confidential

Nesting Queries

SELECT attribute(s)FROM relation(S)WHERE attr [not] {in | comparison operator | exists }

( query statement(s) );

List names of students who are taking “BA201”select Namefrom Studentwhere ID in

( select ID from REGISTRATION where course#=‘BA201’);

Page 15: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 15Georgia State University - Confidential

Sub Queries

List all students enrolled in CIS coursesselect namefrom STUDENTwhere studentnum in

(select StudentIdfrom REGISTRATIONwhere cno like ‘CIS%’);

List all students enrolled in CIS coursesselect namefrom STUDENTwhere studentnum in

(select StudentIdfrom REGISTRATIONwhere cno like ‘CIS%’);

List all courses taken by Student (Id 1011) select cnamefrom COURSEwhere cnum _ any

(select cnofrom REGISTRATIONwhere StudentId = 1011);

Page 16: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 16Georgia State University - Confidential

Sub Queries

Who received the highies grade in CIS 814select StudentIdfrom GRADEwhere cnum = ‘CIS 814’ and

grade >=all(select gradefrom GRADEwhere cno = ‘CIS 814’);

Who received the highies grade in CIS 814select StudentIdfrom GRADEwhere cnum = ‘CIS 814’ and

grade >=all(select gradefrom GRADEwhere cno = ‘CIS 814’);

List all students enrolled in CIS courses.select namefrom STUDENT Swhere exists

(select *from REGISTRATIONwhere StudentId = S.Studentnum

and cno like “CIS%’);

Page 17: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 17Georgia State University - Confidential

Recursive Queries

List all employees who earn more than theirimmediate supervisor.

select E.Emp#, E.title, E.salaryfrom EMPLOYEE, EMPLOYEE Mwhere E.salary > M.salary and

E.ManagerEmp# = M.Wmp#;

Page 18: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 18Georgia State University - Confidential

Summaries and Aggregates

Calculate the average GPA select avg. (GPA)from STUDENT,

Find the lowest GPA select min (GPA) as minGPAfrom STUDENT,

How many CIS majors? select count (StudentId)from STUDENTwhere major=‘CIS’;

Discarding duplicates select avg (distinct GPA)from STUDENTwhere major=‘CIS’

Page 19: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 19Georgia State University - Confidential

Aggregate Functions

COUNT (attr) - a simple count of values in attrSUM (attr) - sum of values in attrAVG (attr) - average of values in attrMAX (attr) - macimum value in attrMIN (attr) - minimum value in attr

Take effect after data are retrieved from the databaseApplied to either the entire resulting relation or groupsCan’t be involved in any query qualifications (where clause)

Would the following query be permitted?

select StudentId

from STUDENT

where GPA = max (GPA);

Page 20: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 20Georgia State University - Confidential

Missing or Incomplete Information

•List all students whose address or telephone number is missing:

select *

from STUDENT

where Address is null or GPA is null;

Truth Table T T T F F F U U U

T F U T F U T F U

~ a

a & b

a or b

Page 21: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 21Georgia State University - Confidential

Groupin Results Obtained

Show all students enrolled in each course.

select cno, stno

from REGISTRATION

group by cno; Is this grouping OK?

Calculate the average GPA of students by county.

select county, avg (GPA) as County GPA

from STUDENT

group by county;

Calculate the average GPA of each class.

select cno, term, year, count (stno) as enrol

from REGISTRATION

group by cno, year, term;

Page 22: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 22Georgia State University - Confidential

Selections on Groups

Show all CIS courses that are full.

select cno, count (stno)

from REGISTRATION

group by cno

having count (stno) > 29;

Page 23: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 23Georgia State University - Confidential

Union

List students who live in Atlanta or GPA > 3.0

select ID, Name, DOB, Address

from STUDENT

where Address = ‘Atlanta’

union

select ID, Name, DOB, Address

from STUDENT

where GPA > 3.0;

Can we perform a Union on any two Relations ?

Page 24: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 24Georgia State University - Confidential

Union Compatibility

Two relations, A and B, are union-compatible

if

1) A and B contain a same number of attributes, and

2) The corresponding attributes of the two have the same domains

Examples

CIS=Student (ID: Did; Name: Dname; Address: Daddr; Grade: Dgrade);

Senior-Student (SName: Dname; S#: Did; Home: Daddr; Grade: Dgrade);

Course (C#: Dnumber; Title: Dstr; Credits: Dnumber)

Are CIS-Student and Senior-Student union compatible?

Are CIS-Student and Couse union compatible?

What happens if we have duplicate tuples?

What will be the column names in the resulting Relation?

Page 25: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 25Georgia State University - Confidential

Union, Intersect, Minus

select CUSTNAME, ZIPfrom CUSTOMERwhere STATE = ‘MA’ UNIONselect SUPNAME, ZIPfrom SUPPLIERwhere STATE = ‘MA’ ORDER BY 2;

select CUSTNAME, ZIPfrom CUSTOMERwhere STATE = ‘MA’ UNIONselect SUPNAME, ZIPfrom SUPPLIERwhere STATE = ‘MA’ ORDER BY 2;

select CUSTNAME, ZIPfrom CUSTOMERwhere STATE = ‘MA’ INTERSECTselect SUPNAME, ZIPfrom SUPPLIERwhere STATE = ‘MA’ ORDER BY 2;

select CUSTNAME, ZIPfrom CUSTOMERwhere STATE = ‘MA’ INTERSECTselect SUPNAME, ZIPfrom SUPPLIERwhere STATE = ‘MA’ ORDER BY 2;

select CUSTNAME, ZIPfrom CUSTOMERwhere STATE = ‘MA’ MINUSselect SUPNAME, ZIPfrom SUPPLIERwhere STATE = ‘MA’ ORDER BY 2;

select CUSTNAME, ZIPfrom CUSTOMERwhere STATE = ‘MA’ MINUSselect SUPNAME, ZIPfrom SUPPLIERwhere STATE = ‘MA’ ORDER BY 2;

B

A

B

A

B

AA

Page 26: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 26Georgia State University - Confidential

Union, Intersect, Minus

Page 27: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 27Georgia State University - Confidential

Agenda

SQL JoinQueries

Page 28: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 28Georgia State University - Confidential

Connecting/Linking Relations

List information about all students and the classes they are taking

What can we use to connect/link Relations?Join: Connecting relations so that relevant tuples can be retrieved.

ID Name ***s1 Jose ***s2 Alice ***s3 Tome ****** *** *** Emp# ID C# ***

e1 s1 BA 201 ***

e3 s2 CIS 300 ***

e2 s3 CIS 304 ***

*** *** ***

Student

Class

Page 29: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 29Georgia State University - Confidential

Join

CartesianProduct

Student: 30 tuples Class: 4 tuples

Total Number of Tuples in the Cartesian Product. ? (match each tuple of student to every tuple of class)

Select tuples having identical Student Ids.Expected number of such Tuples: Join Selectivity

R1 R2

Page 30: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 30Georgia State University - Confidential

Join Forms

• General Join Forms– Equijoin

– Operator Dependent• Natural Join• Outer Join

– Left– Right– Full

select s.*.c.*from STUDENT s, CLASS cwhere s.ID = c. ID (+);

select s.*.c.*from STUDENT s, CLASS cwhere s.ID = c. ID;

=x > y

<>...

R1 R2

R1 R2

Page 31: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 31Georgia State University - Confidential

Grouping Results after Join

Calculate the average GPA of each class

select course#, avg (GPA)from STUDENT S, CLASS Cwhere S.ID = C.IDgroup by course#,

Page 32: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 32Georgia State University - Confidential

Index

There is no order among tuples Indexes speed up data retrieval

– Find all students who live in Atlanta.

How many tuples would you have to search for this query?

What if the table was ‘indexed’?

Index Table StudentPtr

AtlantaBoston

ID Name Address GPAs1 Joseph Boston

Alice Atlanta

*** *** ***

Page 33: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 33Georgia State University - Confidential

Creating and Deleting Indices

CREATE [UNIQUE] INDEX index name ON base-relation-name

( attr-name [order], attr-name[order] ...)[CLUSTER];

create unique index student-id on STUDENT ( ID ASC );

create index Address-index on Student (Address);

create unique index Name-Age-Index on STUDENT ( Name DESC, Age );

• What are the advantages & disadvantages of Indexing?

Page 34: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 34Georgia State University - Confidential

Relational Views

• Relations derived from other relations.• Views have no stored tuples.• Are useful to provide multiple user views.

View 1 View 2 View N

BaseRelation 1

BaseRelation 2

Page 35: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 35Georgia State University - Confidential

View Creation

Create View view-name [ ( attr [ , attr ] ...) ]

AS subquery

[ with check option ] ;

DROP VIEW view-name;

– Create a view containing the student ID, Name, Age and GPA for those who are qualified to take 300 level courses, i.e., GPA >=2.0.

Page 36: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 36Georgia State University - Confidential

View Options

• With Check Option enforces the query condition for insertion or update• To enforce the GPA >=2.0 condition on all new student tuples inserted into

the view

• A view may be derived from multiple base relations • Create a view that includes student IDs, student names and their instructors’

names for all CIS 300 students.

Page 37: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 37Georgia State University - Confidential

View Retrieval

Queries on views are the same as that on base relations.

Queries on views are expanded into queries on their base relations.

select Name, Instructor-Name

from CIS300-Student

where Name = Instructor-Name;

?

Page 38: MGS4020_09.ppt/Mar 12, 2013/Page 1 Georgia State University - Confidential MGS 4020 Business Intelligence Relational Algebra and Structured Query Language.

MGS4020_09.ppt/Mar 12, 2013/Page 38Georgia State University - Confidential

View: Update

Update on a view actually changes its base relation(s)!

update Qualified-Student

set GPA = GPA-0.1

where SID = ‘s3’;

insert into Qualified-Student

values ( ‘s9’, ‘Lisa’, 4.0 )

insert into Qualified-Student

values ( ‘s10’, ‘Peter’, 1.7 )

Why are some views not updateable?

What type of views are updateable?