Download - RDBMS (Relational Database Management Systems)rbvrrwomenscollege.net/wp-content/uploads/2018/05/RDBMS-Unit-I.pdf · A database management system (DBMS) consists of a ... –Oracle,

1

RDBMS

(Relational Database

Management Systems)

2

Introduction to DBMS

Data and Information

Where data is some meaningful fact or

figure, information is the processed data

on which decisions and actions are based.

Information can also be defined as the

organized and classified data to provide

the meaningful values to the receiver.

3

Data

Input

or

Information

output

orProcess

Data processing: It is the step-by-step refinement of data

to get out the desired information.

4

Database Management System Definition

A database management system (DBMS) consists of acollection of interrelated data and a set of programs to accessthat data. The collection of data, usually referred to as thedatabase, contains information about one particular enterprise.

The primary goal of a DBMS is to provide an environment thatis both convenient and efficient to use in retrieving and storingdatabase information.

Database systems are designed to manage large bodies ofinformation. The management of data involves both thedefinition of structures for the storage of information and theprovision of mechanisms for the manipulation of information.

In addition, the database system must provide for the safety ofthe information stored, despite system crashes or attempts atunauthorized access. If data is to be shared among several users,the system must avoid possible inconsistent results.

5

Contd…

The importance of information in most organizations, andthe value of the database, has led to the development of alarge body of concepts and techniques for the efficientmanagement of data.

A database management system is also a collection ofsoftware programs that stores data, organizes the data intorecords, and allows access to the data in a uniform andconsistent way.

In a database management system (DBMS), applicationprograms do not obtain the data they need directly from thestorage media. They must first request the data from theDBMS. The DBMS then retrieves the data from the storagemedia and provides them to the application programs.

Thus a database management system operates betweenapplication programs and the data.

6

Relationship of application programs, a database management system, and a

database.

Application

program

Application

program

Application

program

Database

Management

System

Database

7

Purpose of Database Systems.

The purpose of the database systems is

replace the conventional file processing

system which has major disadvantages, to a

robust system, which is capable of storing

the data by eliminating redundancy,

inconsistency, security problems and

integrity problems.

8

The typical file-processing system has a

number of disadvantages:

Data redundancy and inconsistency.

Difficulty in accessing data.

Data Isolation.

Concurrent access anomalies.

Security Problems.

Integrity problems.

Data Abstraction. Data abstraction is the reduction of a particular body

of data to a simplified representation of the whole.

Abstraction, in general, is the process of taking away

or removing characteristics from something in order

to reduce it to a set of essential characteristics. As in

abstract art, the representation is likely to be one

potential abstraction of a number of possibilities.

A database abstraction layer, for example, is one of a

number of such possibilities.

9

Contd..

Data abstraction is usually the first step in database

design. A complete database is much too complex a

system to be developed without first creating a

simplified framework. Data abstraction makes it

possible for the developer to start from essential

elements -- data abstractions -- and incrementally add

data detail to create the final system.

10

11

Data Abstraction(contd.. ) A database management system is a collection of

interrelated files and a set of programs that allow users toaccess and modify these files.

A major purpose of a database system is to provide userswith an abstract view of the data. That is, the systemhides certain details of how the data is stored andmaintained.

However, in order for the system to be usable, data mustbe retrieved efficiently. This concern has lead to thedesign of complex data structures for the representationof data in the database.

Since many database systems users are not computertrained, the complexity is hidden from them throughseveral levels of abstraction in order to simplify theirinteraction with the system.

12

Three levels of abstraction.

View

level 1

Physical level

Conceptual

Level

View

level 2

View

level n

13

Physical level: The lowest level of abstraction describes what areactually stored. At the physical level, complex low-level datastructures are described in detail.

Conceptual level: The next higher level of abstraction describeswhat data are actually stored in the database, and the relationshipsthat exist among the data. Here the entire database is described interms of a small number of relatively simple structures. Althoughimplementation of simple structures at the conceptual level mayinvolve complex physical-level structures, the user of theconceptual level need not be aware of this. The conceptual levelof abstraction is used by database administrators, who must decidewhat information is to be kept in the database.

View Level: The highest level of abstraction describes only partof the entire database. Despite the use of simpler structures at theconceptual level, some complexity remains because, of the largesize of the database. Many users of the database system will notbe concerned with all of this information. Instead, such users needonly a part of the database. To simplify their interaction with thesystem, the view level of abstraction is defined. The system mayprovide many views for the same database.

14

Overall System Structure.

Naïve Users Application Programmers Sophisticated Users Database administrator

(daily data users)Application

interfaces

Application

programs

Query Database

scheme

Application

programs

object code

DML Pre

CompilerQuery

Processor

Database

manager

DDL

compiler

Disk Storage

File

manager

Data files

Data

dictionary

Database

Management

System

15

Overall System Structure.

A database system is partitioned into modules that deal with each of the responsibilities of the overall system.

In most cases, the computer‟s operating system provides only the most basic services and the database system must build on that base.

Thus, the design of a database system must include consideration of the interface between the database system and the operating system.

The function components of a database system include:

File manager

Database manager

Query processor

DML pre-compiler

DDL compiler

In addition, several data structures are required as part of the physicalsystem implementation, including:

Data files

Data dictionary

Indices

16

What is a database?

17

Anatomy of a Database

Data

Definition

18


3GL

Manipulation

Indices

Data

Definition

Data

Manipulation

19


Etc.

Security

Maintenance

GUI

3GL

Manipulation

4GL Query

Processor

Indices

Data

Data

Definition

Data

Manipulation

Data

Control

20

• Try to think why each of these need to use a database:

– Supermarkets

– Insurance

– Credit Cards/Banking

– Libraries

– Travel Agents

– Universities

– Engineering

Common Uses of Databases

21

DEFINITION

• A collection of application programs that perform

services to end users.

• Each program defines and manages its own data.

File Based Systems

22

Data Entry

& Reports

File handling

Routines

File Definition

Sales Files

Data Entry

& Reports

File handling

Routines

File Definition

Lease Files

File Based Processing

23

• Separation & Isolation of Data

• Data Dependence

• Duplication of Data

• Incompatible file formats

• How do we resolve these problems?

Limitations of File Based

Systems

24

DEFINITION

• A shared collection of logically related data

designed to meet the information

requirements of an organisation

The Database Approach

25

Data Entry

& reports

Data Entry

& reports

DBMS

Leases App. Programs

Database

Database Processing

26

DEFINITION

• A software system that enables users to

define, create and maintain the database and

which provides controlled access to the

database

Database Management

System (DBMS)

27

• Allows users to define the database (DDL)

• Allows users to insert, update, delete & retrieve

data (DML)

• Provides controlled access

– a security system

– an integrity system

– a concurrency control system

– a recovery system

– a user accessible catalogue

Facilities of a DBMS

28

• Hardware

• Software

• Data

• Procedures

• People

Components of a DBMS

29

• Minimal data redundancy

• Consistency of data

• Integration of data

• Improved integrity

• Consistent security

• Standards

• Increased productivity

Advantages

30

• Complexity

• Additional Hardware Costs

• Size

• Performance

• Experts -Specialised Personnel

• Potential organisational Conflict

• Higher impact of failure

Disadvantages

31

Early Types of DBMS

– Hierarchical

– Network

Current Generation

– Relational

Advanced Systems

- Object Based

Types of DBMS

32

• Relational database system devised by Codd in 1970

• An attempt to devise a standard model with a sound mathematical basis

– why does this differ to the previous systems?

• Most successful database model

• Most use the query language SQL

• Examples include:

– Oracle, Microsoft Access, FoxPro, MySql, SQLServer etc

Relational Database

33

Relational Database -

Example• BRANCH relation

• STAFF relation

branchNo street city postcode

B005 22 Deer Rd London SW1 4EH

B007 16 Argyll St Aberdeen AB2 3SU

B003 163 Main St Glasgow G11 9Q X

StaffNo Name Position Salary branchNo

SL21 John White Manager 30000 B005

SG37 Ann Beech Assistant 12000 B003

SG14 David Ford Supervisor 18000 B003

34

What is a database …

… and how is it used?

35

Operational Databases

• Ongoing business activities

• Dynamic data

• Examples

– Inventory control

– Accounting records

– Order processing

– Scheduling

• Transaction Processing Systems (TPS)

36

Analytical Databases

• Historical / Time-dependent data

• Static data

• Examples

– Census / demographic records

– Sales forecasting

• Data Mining

Client/Server Technology

client/server architecture

A network architecture in which each computer or process on the network is either a client or a server.

Servers are powerful computers or processes dedicated to managing disk drives (file servers), printers (print servers), or network traffic (networkservers ).

Clients are PCs or workstations on which users run applications. Clients rely on servers for resources, such as files, devices, and even processing power.

Two-tier Client/Server

Architectures

Two-tier Client/Server

Architectures

This type of architecture usually consists of a Windows

based client program, and a server database such as

Oracle.

The graphical user interface (GUI) communicates with the

database server across the network via Structured Query

Language (SQL), and may be developed quickly with IDE

based tools such as Delphi.

Limitations of a two-tier

application

The limitations of a two-tier application become evident after 100 to 150 users log in. Because business logic is processed on the client machine, large data sets are downloaded across the network and calculated or summarized by the client application.

This type of architecture is very taxing on network infrastructure, and to some degree on the database server. Still, two-tier applications are well suited for small to midsize user groups, and are still developed widely today.

Advantages of client/server

computing Client/server computing facilitates the use of GUI that is

available on workstations.

The visual presentation in turn increases the productivity of the end user as it is very easy to use.

Investment in training and education can be leveraged better and application development will be faster.

Client/server environment exploits the power of the workstations and due to the fact that client and servers might run on different software and hardware platforms it encourages the acceptance of open systems.

1-tier architecture A 1-tier architecture is the most

basic setup because it involves a

single tier on a single machine.

Think of an application that runs on

your PC: Everything you need to

run the application (data storage,

business logic, user interface, and

so forth) is wrapped up together.

An example of a 1-tiered

application

An example of a 1-tiered application is a basic

word processor or a desktop file utility program.

Although the 1-tier approach is a simple design

that's easy to distribute, it does not scale well.

In addition, because you are limited to running

the entire application (including the user

interface) on single machine, a 1-tier

architecture does not adequately address the

needs of a web-based application.

2-tier architecture

A 2-tier architecture is the basic terminal-to-

server or browser-to-server relationship. You

could have a "smart" client that performs most

of the work talking to a "dumb" server; or, more

commonly, a "dumb" client talking to a "smart"

server. Sometimes you have both. In essence,

the client handles the display, the server

handles the database, and the business logic is

contained on one or both of the two tiers.

An example of a 2-tier

approach

An example of a 2-tier approach is the basic

web model where a web server serves pages

to a web browser. Another example of a 2-tier

approach is a specialized terminal-to-server

application.

Although the 2-tier approach increases

scalability and separates the display and

database layers, it does not truly separate the

application into highly specialized, functional

layers. Because of this lack of specialization,

most applications quickly outgrow this model.

3-Tier Architecture

A 3-tier architecture A 3-tier architecture is the most common approach

used for web applications today. In the typical

example of this model, the web browser acts as the

client, an application server (such as Macromedia

ColdFusion) handles the business logic, and a

separate tier (such as Oracle or MySQL database

servers) handles database functions.

3-tier client/server

environment.

In a 3-tier client/server environment there are 3 tiers

as the name indicates. The first tier consists of user

interface on the client, the second tier or the middle

tier consists of business or application logic and the

final tier handles the data (database). The first tier

never directly interacts with the third tier.

3-tier architecture has several

advantages:

Clear separation of user interface control and data

presentation from application logic

Centralized data storage, easy to manage

Scalable, load balancing

Change management: simpler and faster to

exchange a component on the server side

Database Models.

http://images.google.co.in/imgres?imgurl=http://www.cupertino.de/data/models/imac.gif&imgrefurl=http://forum.hardware.fr/hardwarefr/AchatsVentes/VDS-imac-blue-350mhz-RP-carton-poil-sujet-262974-1.htm&h=172&w=225&sz=13&hl=en&start=17&tbnid=dVHdOpaFTW1wgM:&tbnh=83&tbnw=108&prev=/images%3Fq%3Ddata%2Bmodels%26svnum%3D10%26hl%3Den%26lr%3D

A data model is a collection of concepts that can

be used to describe the structure of a database and

provides the necessary means to achieve this

abstraction whereas structure of a database means

the data types, relationships and constraints that

should hold on the data.

It is a collection of conceptual tools for

describing data, data relationships, data semantics

and consistency constraints. The various data

models that have been proposed fall into three

different groups. Object based logical models,

record-based logical models and physical models.

Object-Based Logical Models

They are used in describing data at the logical andview levels. They are characterized by the factthat they provide fairly flexible structuringcapabilities and allow data constraints to bespecified explicitly. There are many differentmodels and more are likely to come. Several ofthe more widely known ones are:

The Entity-Relationship model.

The object-oriented model.

The semantic data model.

The info logical data model.

The functional data model.

The entity-relationship Model The (E-R) data model is based on a perception of a real

world that consists of a collection of basic objects, calledentities, and of relationships among these objects.

The overall logical structure of a database can be expressedgraphically by an E-R diagram. This is built up by thefollowing components:

Rectangles, which represent entity sets

Ellipses, which represent attributes

Diamonds, which represent relationships among entity sets

Lines, which link attributes to entity sets and entity sets to relationships.

Customer DepositAccount

A sample E-R diagram

The Object-oriented Model Like the E-R model the object-oriented model is

based on a collection of objects. An object containsvalues stored in instance variables within the object.An object also contains bodies of code that operateon the object. These bodies of code are calledmethods.

A class is the collection of objects which consist ofthe same types of values and the same methods.

E.g. account number & balance are instancevariables; pay-interest is a method that uses theabove two variables and adds interest to the balance.

A data model is a logic organization of the real

world objects (entities), constraints on them, and

the relationships among objects. A DB language

is a concrete syntax for a data model. A DB

system implements a data model.

A core object-oriented data model consists of the

following basic object-oriented concepts: (1)

object and object identifier: Any real world

entity is uniformly modeled as an object

(associated with a unique id: used to pinpoint an

object to retrieve).

attributes and methods: every object has a state (the set of values for the attributes of the object) and a behavior (the set of methods - program code - which operate on the state of the object). The state and behavior encapsulated in an object are accessed or invoked from outside the object only through explicit message passing.

[ An attribute is an instance variable, whose domain may be any class: user-defined or primitive. A class composition hierarchy (aggregation relationship) is orthogonal to the concept of a class hierarchy. The link in a class composition hierarchy may form cycles. ]

class: a means of grouping all the objects which share the same set of attributes and methods. An object must belong to only one class as an instance of that class (instance-of relationship). A class is similar to an abstract data type. A class may also be primitive (no attributes), e.g., integer, string, Boolean.

Class hierarchy and inheritance: derive a new class (subclass) from an existing class (superclass). The subclass inherits all the attributes and methods of the existing class and may have additional attributes and methods. single inheritance (class hierarchy) vs. multiple inheritance (class lattice).

Record-Based Logical Models

Record based logical models are also used in

describing data at the logical and view levels. In

contrast to object-based data models, they are

used both to specify the overall logical structures

of the database, and to provide a higher-level

description of the implementation.

Record-based models are so named because the

database is structured in fixed-format records of

several types. Each record type defines a fixed

number of fields, or attributes, and each field is

usually of a fixed length.

The three most widely accepted record-

based data models are

the relational,

network, and

hierarchical models.

Relational Database Model

The relational model uses a collection of

tables to represent both data and the

relationships among those data. Each table

has multiple columns, and each column has

a unique name.


College

Department

Instructor

Section

Student

Registration

Staff

Course


Registration

College

Department

Instructor

Section

Student

Staff

Course

TA

Prereqs

Course Fees

Scholarships

Network Database Model

Data in the network model is represented

by collection of records, and relationship

among data is represented by links, which

can be viewed as pointers. The records in

the database are organized as collections of

arbitrary graphs. Such type of database is

shown in the Figure.

Network Database Model

• Multiple inverted trees (with shared branches)

College

Instructor

Section

Registration

Staff

Staff StudentCourse

Department

Hierarchical Database Model

The hierarchical model is similar to the

network model in the sense that data and

relationships among data are represented

by records and links, respectively. It differs

from the network model in that records are

organized as collection of trees rather than

arbitrary graphs.


• Inverted tree (with single root)

• Parent - childCollege

Department

Instructor




Department

Instructor

Staff

Staff




Department

Instructor

Course/Section

Staff

Staff




Department

Instructor

Course/Section

Student/Reg

Staff

Staff

Intro to Distributed Databases

In a distributed database system, the

database is stored on several computers.

The computers in a distributed system (DS)

communicate with one another through

various communication media, such as

high-speed buses or telephone lines. They

do not share main memory, nor do they

share a clock.

DB1

DB4

DB2

DB3

Monitoring server

The processors in a DS may vary

in size and function.

They may include small micro computers,

work stations, mini computers, and large

general-purpose computer

systems.

Structure of DDB

A DDB system consists of a collection of sites (processors),

each of which may participate in the execution of

transactions which access data at one site, or several sites.

Each site is able to process local transactions, those

transactions that access data only in that single site. In

addition, a site may participate in the execution of global

transactions, those transactions that access data in several

sites. The execution of global transactions requires

communication among the sites.

The sites in the system can be connected physically in a

variety of ways. The various topologies are represented as

graphs whose nodes correspond to sites.

The major differences among

these configurations involve:

Installation cost.

Communication cost.

Reliability.

Availability.

Advantages of distributed

databases

Data sharing and Distributed Control

Reliability and Availability

Speedup of Query processing

Disadvantages of DDB

Software development cost.

Greater potential for bugs.

Increased processing overhead.

Relational Data Model

A Brief History of Data Models 1950s file systems, punched cards

1960s hierarchical

IMS

(Information Management System)

1970s network

CODASYL(Conference/Committee on Data Systems Languages), IDMS (Integrated Database Management System)

1980s relational

INGRES, ORACLE, DB2, Sybase

Paradox, dBase

1990s object oriented and object relational

O2, GemStone, Ontos

Relational Model Sets

collections of items of the same type

no order

no duplicates

Mappings

domain range1:many

many:1

1:1

many:many

COURSECourseno Subject Lecturer Machine

CS250 Programming Linden Sun

CS260 Graphics Hrutik Sun

CS270 Micros Woods PC

CS290 Verification Barringer Sun

Relational Model Notes no duplicate tuples in a relation

a relation is a set of tuples

no ordering of tuples in a relation a relation is a set

attributes of a relation have an implied ordering but used as functions and referenced by name, not

position

every tuple must have attribute values drawn from all of the domains of the relation or the special value NULL

all a domain’s values and hence attribute’s values are atomic.

Comparative Terms

NotationCourse (courseno, subject, equipment)Student(studno,name,hons)Enrol(studno,courseno,labmark)

Formal Oracle

Relation schema Table descriptionRelation TableTuple RowAttribute ColumnDomain Value set

Keys

SuperKey a set of attributes whose values together uniquely

identify a tuple in a relation

Candidate Key a superkey for which no proper subset is a

superkey…a key that is minimal .

Can be more than one for a relation

Primary Key a candidate key chosen to be the main key for the

relation.

One for each relation

Keys can be composite

Foreign Key

a set of attributes in a relation that exactly matches a (primary) key in another relation the names of the attributes don’t have to be the same

but must be of the same domain

a foreign key in a relation A matching a primary key in a relation B represents a

many:one relationship between A and B

Student(studno,name,tutor,year)

Staff(lecturer,roomno,appraiser)

STUDENT

studno name hons tutor year

s1 jones ca bush 2s2 brown cis kahn 2s3 smith cs goble 2s4 bloggs ca goble 1s5 jones cs zobel 1s6 peters ca kahn 3

STAFF

lecturer roomno appraiser

kahn IT206 watsonbush 2.26 capongoble 2.82 caponzobel 2.34 watsonwatson IT212 barringerwoods IT204 barringercapon A14 watsonlindsey 2.10 woodsbarringer 2.125 null

Referential Integrity

Student(studno,name,tutor,year)

Staff(lecturer,roomno,appraiser)

CASCADE delete all matching foreign key tuples

eg. STUDENT

RESTRICT can’t delete primary key tuple STAFF whilst a foreign key

tuple STUDENT matches

NULLIFY foreign key STUDENT.tutor set to null if the foreign key

ids allowed to take on null

Entity Integrity and NullsNo part of a key can be

null

Attribute values

Atomic

Known domain

Sometimes can be null

THREE categories of null

values

1. Not applicable

2. Not known

3. Absent (not recorded)

STUDENTstudno name hons tutor year thesis title

s1 jones ca bush 2 nulls2 brown cis kahn 2 nulls3 smith null goble 2 nulls4 bloggs ca goble 1 nulls5 jones cs zobel 1 nulls6 peters ca kahn 3 null

Relational Model General

Simple

Flexible

Easy to query declaratively without programming

But.....

Good design essential

Integrity essential

Poor semantics

Relationships based on ‘value-matching’

Relational Designstudno

name tutor roomno courseno

labmark

subject

s1 jones bush 2.26 cs250 65 programming

s1 jones bush 2.26 cs260 80 graphicss1 jones bush 2.26 cs270 47 electronicss2 brown kahn IT206 cs250 67 programmings2 brown kahn IT206 cs270 65 electronicss3 smith goble 2.82 cs270 49 electronicss4 bloggs goble 2.82 cs280 50 designs5 jones zobel 2.34 cs250 0 programmings6 peters kahn IT206 cs250 2 programmingnull null capon A14 null null nullnull null null null cs290 null specifications7 patel null null null null null

Informal guidelines Semantics of the attributes

easy to explain relation

doesn’t mix concepts

Reducing the redundant values in tuples

Choosing attribute domains that are atomic

Reducing the null values in tuples

Disallowing spurious tuples

Definitions

Cartesian Product The cartesian product () between n sets is the set of allpossible combinations of the elements of those sets.

Domain set of all possible values for an attribute; for attribute A, the domain isrepresented as dom(A). A domain has a format and a base data type.

Relation Schema denoted by R(A1, A2, …, An), is made up of relation name Rand list of attributes A1, A2, …, An.

Relation a subset of the cartesian product of its domains. Given a relationschema R, a relation on that schema r, a set of attributes A1..An for that relationthen

r(R) (dom(A1) dom(A2) ... dom(An))

Attribute a function on a domain for each instance of the mapping or tuple

Attribute Value the result of the attribute function. Each instance of themapping is represented by one attribute value drawn from each domain or aspecial NULL value. Given a tuple t and an attribute A for a relation r, t[A]-->a, where a is the attribute‟s value for that tuple.

(N)-tuple a set of (n) attribute-value pairs representing a single instance of a relation‟s

mapping between its domains.

Degree the number of attributes a relation has.

Cardinality a number of tuples a relation has.

Roles several attributes can have the same domain; the attributes indicate different roles

in the relation.

Key (SuperKey) a set of attributes whose values together uniquely identify every tuple

in a relation. Let t1 and t2 be two tuples on relation r of relation schema R, and sk be a

set of attributes whose values are the key for the relation schema R, then t1[sk] t2[sk].

(Candidate) Key a (super)key that is minimal, i.e. has no proper subsets that still

uniquely identify every tuple in a relation. There can be several for one relation.

Primary Key a candidate key chosen to be the main key for the relation. There is only

one for each relation.

Foreign Key a candidate key of relation A situated in relation B.

Database a set of relations.

Query Languages

•A query language is a language in which a user

requests information from the database.

•These languages are typically of a higher level than

standard programming languages.

•Query languages can be categorized as being either

procedural or non-procedural .

Procedural : The user instructs the system to

perform a sequence of operations to retrieve the

desired information from the database

Non-Procedural: The user describes the

information desired without giving a specific

procedure for obtaining that information.

Relational Algebra

p

Relational Query Languages Query languages: Allow manipulation and

retrieval of data from a database.

Relational model supports simple, powerful QLs:

Strong formal foundation based on logic.

Allows for much optimization.

Query Languages != programming languages!

QLs not expected to be “Turing complete”.

QLs not intended to be used for complex

calculations.

QLs support easy, efficient access to large data sets.

Formal Relational Query Languages

Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:

Relational Algebra: More operational, very useful for representing execution plans.

Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-procedural, declarative.)

* Understanding Algebra & Calculus is key to understanding SQL, query processing!

Preliminarie

s

A query is applied to relation instances, and

the result of a query is also a relation instance.

Schemas of input relations for a query are fixed

(but query will run over any legal instance)

The schema for the result of a given query is

also fixed. It is determined by the definitions of

the query language constructs.

Positional vs. named-field notation:

Positional notation easier for formal definitions,

named-field notation more readable.

Both used in SQL

Relational Algebra: 5 Basic Operations

Selection ( ) Selects a subset of rows from relation (horizontal).

Projection ( ) Retains only wanted columnsfrom relation (vertical).

Cross-product (x) Allows us to combine two relations.

Set-difference (–) Tuples in r1, but not in r2.

Union ( ) Tuples in r1 and/or in r2.

Since each operation returns a relation, operations can be composed! (Algebra is “closed”.)

p

Example

Instances

R1

S1

S2

BoatsBid=boat id

bid bname color

101 Interlake blue

102 Interlake red

103 Clipper green

104 Marine red

sid bid day

22 101 10/10/96

58 103 11/12/96

sid sname rating age

22 dustin 7 45.0

31 lubber 8 55.5

58 rusty 10 35.0


28 yuppy 9 35.0

31 lubber 8 55.5

44 guppy 5 35.0

58 rusty 10 35.0

Sailors

Sid= sailor id

Projection

page S( )2 Examples: ;

Retains only attributes that are in the “projection list”.

Schema of result:

exactly the fields in the projection list, with the same names that they had in the input relation.

Projection operator has to eliminate duplicates (How do they arise? Why remove them?)

Note: real systems typically don‟t do duplicate elimination unless the user explicitly asks for it. (Why not?)

psname rating

S,

( )2

Projection

)2(,

Sratingsname

p

page S( )2S2


28 yuppy 9 35.0

31 lubber 8 55.5

44 guppy 5 35.0

58 rusty 10 35.0

sname rating

yuppy 9

lubber 8 guppy 5 rusty 10

age

35.0 55.5

Selection ()

rating

S8

2( ) p sname rating rating

S,

( ( ))8

2

Selects rows that satisfy selection condition.

Result is a relation.

Schema of result is same as that of the input relation.

Do we need to do duplicate elimination?


28 yuppy 9 35.0

31 lubber 8 55.5

44 guppy 5 35.0

58 rusty 10 35.0

sname rating

yuppy 9

rusty 10

Union and Set-Difference

All of these operations take two input

relations, which must be union-compatible:

Same number of fields.

`Corresponding‟ fields have the same type.

For which, if any, is duplicate elimination

required?

Union

S S1 2

S1

S2


22 dustin 7 45.0

31 lubber 8 55.5

58 rusty 10 35.0


28 yuppy 9 35.0

31 lubber 8 55.5

44 guppy 5 35.0

58 rusty 10 35.0


22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 44 guppy 5 35.0 28 yuppy 9 35.0

Set Difference

S1

S2

S S1 2

S2 – S1


22 dustin 7 45.0

31 lubber 8 55.5

58 rusty 10 35.0


28 yuppy 9 35.0

31 lubber 8 55.5

44 guppy 5 35.0

58 rusty 10 35.0


22 dustin 7 45.0


28 yuppy 9 35.0

44 guppy 5 35.0

Cross-Product

S1 x R1: Each row of S1 paired with each row of R1.

Q: How many rows in the result?

Result schema has one field per field of S1 and R1, with field names `inherited’ if possible.

May have a naming conflict: Both S1 and R1 have a field with the same name.

In this case, can use the renaming operator:

( ( , ), )C sid sid S R1 1 5 2 1 1

Cross Product

Example

R1S1

R1 X S1 =


22 dustin 7 45.0

31 lubber 8 55.5

58 rusty 10 35.0

sid bid day

22 101 10/10/96

58 103 11/12/96

(sid) sname rating age (sid) bid day

22 dustin 7 45.0 22 101 10/10/96

22 dustin 7 45.0 58 103 11/12/96

31 lubber 8 55.5 22 101 10/10/96

31 lubber 8 55.5 58 103 11/12/96

58 rusty 10 35.0 22 101 10/10/96

58 rusty 10 35.0 58 103 11/12/96

Compound Operator: Intersection

In addition to the 5 basic operators, there are several additional “Compound Operators”

These add no computational power to the language, but are useful shorthands.

Can be expressed solely with the basic ops.

Intersection takes two input relations, which must be union-compatible.

Q: How to express it using basic operators?

R S = R (R S)

Intersection

S1

S2

S S1 2


31 lubber 8 55.5

58 rusty 10 35.0


22 dustin 7 45.0

31 lubber 8 55.5

58 rusty 10 35.0


28 yuppy 9 35.0

31 lubber 8 55.5

44 guppy 5 35.0

58 rusty 10 35.0

Compound Operator: Join

Joins are compound operators involving cross product, selection, and (sometimes) projection.

Most common type of join is a “natural join” (often just called “join”). R S conceptually is: Compute R X S

Select rows where attributes that appear in both relations have equal values

Project all unique atttributes and one copy of each of the common ones.

Note: Usually done much more efficiently than this.

Useful for putting “normalized” relations back together.

Natural Join Example

R1S1

R1 S1 =

sid sname rating age bid day

22 dustin 7 45.0 101 10/10/9658 rusty 10 35.0 103 11/12/96


22 dustin 7 45.0

31 lubber 8 55.5

58 rusty 10 35.0

sid bid day

22 101 10/10/96

58 103 11/12/96

Other Types of Joins

Condition Join (or “theta-join”):

Result schema same as that of cross-product.

May have fewer tuples than cross-product.

Equi-Join: Special case: condition c contains only conjunction of equalities.

R c S c R S ( )

“Theta” Join Examplesid sname rating age

22 dustin 7 45.0

31 lubber 8 55.5

58 rusty 10 35.0

sid bid day

22 101 10/10/96

58 103 11/12/96

R1S1

S1S1.sidR1.sid

R1 =

(sid) sname rating age (sid) bid day

22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96

Compound Operator: Division

Useful for expressing “for all” queries like: Find sids of sailors who have reserved all boats.

For A/B attributes of B are subset of attrs of A.

May need to “project” to make this happen.

E.g., let A have 2 fields, x and y; B have only field y:

A/B contains all x tuples such that for every y tuple in B, there

A B x y B( x,y A)

Examples of Division A/Bsno pno

s1 p1

s1 p2

s1 p3

s1 p4

s2 p1

s2 p2

s3 p2

s4 p2

s4 p4

pnop2

pnop2p4

pnop1p2p4

snos1s2s3s4

snos1s4

snos1

A

B1B2

B3

A/B1 A/B2 A/B3

Expressing A/B Using Basic

Operators

Division is not essential op; just a useful

shorthand.

(Also true of joins, but joins are so common that

systems implement joins specially.)

Idea: For A/B, compute all x values that are not

`disqualified‟ by some y value in B.

x value is disqualified if by attaching y value from B,

we obtain an xy tuple that is not in A.Disqualified x values: p px x A B A(( ( ) ) )

A/B: p x A( ) Disqualified x values

Examples

Reserves

Sailors

Boats

�

sid bid day

22 101 10/10/96

58 103 11/12/96


22 dustin 7 45.0

31 lubber 8 55.5

58 rusty 10 35.0

bid bname color

101 Interlake Blue

102 Interlake Red

103 Clipper Green

104 Marine Red

Find names of sailors who‟ve reserved boat #103 Solution 1: p sname bid

serves Sailors(( Re ) )103

• Solution 2: p sname bidserves Sailors( (Re ))

103

Find names of sailors who‟ve reserved a red boat

Information about boat color only

available in Boats; so need an extra

join:p sname color redBoats serves Sailors((

' ') Re )

v A more efficient (???) solution:

p sname(psid((pbid(color'red '

Boats))Res)Sailors)

* A query optimizer can find this given the first solution!

Find sailors who‟ve reserved a red or a green boat

Can identify all red or green boats, then

find sailors who‟ve reserved one of these

boats:

( , (' ' ' '

))Tempboatscolor red color green

Boats

p sname Tempboats serves Sailors( Re )

Find sailors who‟ve reserved a red and a

green boat

Previous approach won‟t work! Must

identify sailors who‟ve reserved red boats,

sailors who‟ve reserved green boats, then find

the intersection (note that sid is a key for

Sailors): p ( , ((

' ') Re ))Tempred

sid color redBoats serves

p sname Tempred Tempgreen Sailors(( ) )

p ( , ((' '

) Re ))Tempgreensid color green

Boats serves

Find the names of sailors who‟ve reserved

all boats

Uses division; schemas of the input

relations to / must be carefully chosen:

p p( , (,

Re ) / ( ))Tempsidssid bid

servesbid

Boats

p sname Tempsids Sailors( )

v To find sailors who’ve reserved all ‘Interlake’ boats:

/ (' '

)p bid bname Interlake

Boats

.....

Relational Calculus

Relational Calculus Comes in two flavors: Tuple relational calculus (TRC)

and Domain relational calculus (DRC).

Calculus has variables, constants, comparison ops,

logical connectives and quantifiers.

TRC: Variables range over (i.e., get bound to) tuples.

DRC: Variables range over domain elements (= field

values).

Both TRC and DRC are simple subsets of first-order logic.

Expressions in the calculus are called formulas. An

answer tuple is essentially an assignment of constants

to variables that make the formula evaluate to true.

Domain Relational Calculus(DRC) Query has the form:

x x xn p x x xn1 2 1 2, ,..., | , ,...,

Answer includes all tuples thatmake the formula be true.

x x xn1 2, ,...,

p x x xn1 2, ,...,

Formula is recursively defined, starting withsimple atomic formulas (getting tuples fromrelations or making comparisons of values), and building bigger and better formulas usingthe logical connectives.

DRC Formulas Atomic formula:

, or X op Y, or X op constant

op is one of

Formula:

an atomic formula, or

, where p and q are formulas, or

, where variable X is free in p(X), or

, where variable X is free in p(X)

The use of quantifiers and is said to bind X.

A variable that is not bound is free.

x x xn Rname1 2, ,...,

, , , , ,

p p q p q, ,

X p X( ( ))

X p X( ( ))X X

Free and Bound Variables

The use of quantifiers and in a formula

is said to bind X.

A variable that is not bound is free.

Let us revisit the definition of a query:

X X

x x xn p x x xn1 2 1 2, ,..., | , ,...,

There is an important restriction: the variables x1, ..., xn that appear to the left of `|’ must be the only free variables in the formula p(...).

Find all sailors with a

rating above 7

The condition ensures

that the domain variables I, N, T and A are bound

to fields of the same Sailors tuple.

The term to the left of `|‟ (which

should be read as such that) says that every tuple

that satisfies T>7 is in the answer.

Modify this query to answer:

Find sailors who are older than 18 or have a rating

under 9, and are called „Joe‟.

I N T A I N T A Sailors T, , , | , , ,

7

I N T A Sailors, , ,

I N T A, , ,

I N T A, , ,

Find sailors rated > 7 who‟ve reserved boat

#103

We have used as a

shorthand for

Note the use of to find a tuple in Reserves that

`joins with‟ the Sailors tuple under consideration.


7

Ir Br D Ir Br D serves Ir I Br, , , , Re 103

Ir Br D, , . . .

Ir Br D . . .

Find sailors rated > 7 who‟ve reserved a red

boat

Observe how the parentheses control the scope

of each quantifier‟s binding.

This may look cumbersome, but with a good

user interface, it is very intuitive. (MS Access,

QBE)


7

Ir Br D Ir Br D serves Ir I, , , , Re

B BN C B BN C Boats B Br C red, , , , ' '

Find sailors who‟ve reserved all boats

Find all sailors I such that for each 3-tuple

either it is not a tuple in Boats or there is a tuple

in Reserves showing that sailor I has reserved it.

I N T A I N T A Sailors, , , | , , ,

B BN C B BN C Boats, , , ,

Ir Br D Ir Br D serves I Ir Br B, , , , Re

B BN C, ,

Find sailors who‟ve reserved all

boats (again!)

Simpler notation, same query. (Much clearer!)

To find sailors who‟ve reserved all red boats:

I N T A I N T A Sailors, , , | , , ,

B BN C Boats, ,

Ir Br D serves I Ir Br B, , Re

C red Ir Br D serves I Ir Br B

' ' , , Re.....

Unsafe Queries, Expressive Power

It is possible to write syntactically correct calculus queries that

have an infinite number of answers! Such queries are called

unsafe.

e.g.,

It is known that every query that can be expressed in relational

algebra can be expressed as a safe query in DRC / TRC; the

converse is also true.

Relational Completeness: Query language (e.g., SQL) can

express every query that is expressible in relational

algebra/calculus.

S S Sailors|

Summary

Relational calculus is non-operational, and

users define queries in terms of what they

want, not in terms of how to compute it.

(Declarativeness.)

Algebra and safe calculus have same

expressive power, leading to the notion of

relational completeness.

Entity-Relationship

Model

Chap-2

ER Model Basics

Entity: Real-world object distinguishable from

other objects. An entity is described (in DB) using a set of attributes.

Entity Set: A collection of similar entities. E.g., all employees.

All entities in entity set have same set of attributes.

Each entity set has a key.

Each attribute has a domain.

Employees

ssnname

lot

ER Model Basics (Contd.) Relationship: Association among two or more

entities.

E.g., Ashoo works in Pharmacy depart.

Relationship Set: Collection of similar relationships.

An n-ary relationship set R relates n entity sets E1 ... En;

each relationship in R involves entities e1 from E1, ..., en from En

Same entity set could participate in different relationship sets, or in different “roles” in same set.

ER Model Basics (Contd.)

Relationship: Association among

two or more entities.

lot

dname

budgetdid

sincename

Works_In DepartmentsEmployees

ssn

ER Model Basics (Contd.)

Relationship:

Same entity set could participate in

different relationship sets, or in

different “roles” in the same set.

Reports_To

lot

name

Employees

subor-

dinate

super-

visor

ssn

Key Constraints

Many-to-Many1-to-1 1-to Many Many-to-1

Which key constraint ?

lot

dname

budgetdid

sincename


ssn

Key constraints

Consider Works_In:

An employee can work in many departments;

and

a dept can have many employees.

lot

dname

budgetdid

sincename


ssn

Which key constraint ?

lot

dname

budgetdid

sincename


ssn


Which Key Constraint Case ??

Consider

Manager

Relation-

ship?

dname

budgetdid

since

lot

name

ssn

ManagesEmployees Departments

Which Key Constraint Case ??

Consider

Manager

Relation-

ship?

Each dept

has at most

one

manager.


dname

budgetdid

since

lot

name

ssn


Key Constraint: 1 - to - many

Each dept

has at most

one

manager,

according

to the key

constraint

on

Manages.

dname

budgetdid

since

lot

name

ssn


dname

budgetdid

since

lot

name

ssn

ManagesEmployees Departments[0:1][0:n]

Key Constraints

Consider Works_In: An employee can work in many departments; a dept can have many employees.

In contrast, each dept has at most one manager, according to the key constraint on Manages. Many-to-Many1-to-1 1-to Many Many-to-1

dname

budgetdid

since

lot

name

ssn


Participation Constraints

Must every department have a manager?

lot

name dname

budgetdid

sincename dname

budgetdid

since

Manages DepartmentsEmployees

ssn

??


If every department has a manager,

then this is a participation constraint:

the participation of Departments in Manages is

said to be total (vs. partial).

lot

name dname

budgetdid

sincename dname

budgetdid

since


ssn

??


Or, put differently,

every did value in Departments table must

appear in a row of the Manages table (with

a non-null ssn value!)

lot

name dname

budgetdid

sincename dname

budgetdid

since


ssn

??


Every department must have a manager!

lot

name dname

budgetdid

sincename dname

budgetdid

since


ssn

Participation Constraints ?

lot

dname

budgetdid

sincename


ssn

Weak Entities A weak entity can be identified uniquely

only by considering the primary key of

another (owner) entity.

lot

name

agepname

DependentsEmployees

ssn

Policy

cost

Weak Entities weak entity identified uniquely only by considering

primary key of another (owner) entity.

Owner entity set and weak entity set must participate in a one-to-many relationship set (one owner, many weak entities).

Weak entity set must have total participation in this identifying relationship set.

lot

name

agepname

DependentsEmployees

ssn

Policy

cost

ISA (`is a‟) Hierarchies

Contract_Emps

name

ssn

Employees

lot

hourly_wages

ISA

Hourly_Emps

contractid

hours_worked

1. As in C++, or other PLs, attributes are inherited.

2. If we declare A ISA B, every A entity is also considered to be a B entity.


Contract_Emps

name

ssn

Employees

lot

hourly_wages

ISA

Hourly_Emps

contractid

hours_worked

Reasons for using ISA:

To add descriptive attributes specific to a subclass.

To identify entities that participate in a relationship.

Implicit

Relationship

Between

Super-

And

Subentity?

1-1 ?


Contract_Emps

name

ssn

Employees

lot

hourly_wages

ISA

Hourly_Emps

contractid

hours_worked

Overlap constraints: Can Joe be Hourly_Emps as well as Contract_Emps entity? (Allowed/disallowed)

Covering constraints: Does every Employees entity also have to be an Hourly_Emps or a Contract_Emps entity? (Yes/no)

Aggregation

Used when we have to model a relationship involving

entity sets and a relationship set.

budgetdidpid

started_on

pbudget

dname

until

DepartmentsProjects Sponsors

Employees

Monitors

lotname

ssn

since

Aggregation

Aggregation allows us to treat a relationship set as an

entity set for purposes of participation in (other) relationships

budgetdidpid

started_on

pbudget

dname

until


Employees

Monitors

lotname

ssn

since

Aggregation

* Aggregation vs. ternary relationship: v Monitors is a distinct relationship, with a descriptive attribute.v Also, can say that each sponsorship is monitored by at most one employee.

budgetdidpid

started_on

pbudget

dname

until


Employees

Monitors

lotname

ssn

since

Summary of Conceptual/ ER Design

Conceptual design follows requirements analysis,

Yields a high-level description of data to be stored

ER model popular for conceptual design

Constructs are expressive, close to the way people think about their applications.

Basic constructs: entities, relationships, and attributes (of entities and relationships).

Some additional constructs: weak entities, ISA hierarchies, and aggregation.

Note: There are many variations on ER model.

Summary of ER (Contd.) Several kinds of integrity constraints can be

expressed in ER model:

key constraints,

participation constraints, and

overlap/covering constraints for ISA hierarchies.

Some foreign key constraints also implicit in definition of a relationship set.

Some constraints (notably, functional dependencies) cannot be expressed in the ER model.

Constraints play an important role in determining the best database design for an enterprise.

The Real

WorldThe Model

real

customerscustomer

surrogates

Entity Type and entity surrogates

airport

• entity type names must be unique

Single-valued Properties

HartsfieldKastrup

Logan

airport

airportname

• property values are lexical, visible, audible,

….they are things that name other things

Identifying Properties

airport

airportcode

• for each identifying property value there is at most

one instance of the identified entity

• every entity must be uniquely referenceable

atl

cph

bos

lax

suth

Multi-valued Properties

flt-schedule

wemo

mo thwe

mo

weekdays

Composite Properties

airport

airportaddress

airportstreet

airportzip

airportcity

1 Flughafen St

400 Flight Av

12 Logan Rd

Hamburg

Boston

Denver

9012356789

12345

1-1 relationship types

• the names of multiple relationship types

between the same two entity types must

be unique

female-customer

male-customer

current

marriage

1 1

partial functions

0-N and 1-N relationship types

airport flt-schedulefrom1 N

partial function

Mandatory 0-N and 1-N

relationship types


total function

N-M relationship types

customer flt-instancereservationN M

N-ary relationship types

• many ternary relationship types cannot

be reduced to a conjunction of binary

relationship types

plane-part-supplier plane-partsupply

flt-repair-order

quantity

L

N

M

supplier#

repair-order#

part#

Identifying relationship types &

weak entity types

flt-schedule

flt-instance

flt#

date

from

• flt-instance cannot

exist without flt-

schedule

• flt-instance cannot

be identified without

flt-schedule

• (flt#, date) identifies

flt-instance

recursive relationship types

flt-scheduleconnection

departuretime

arrivaltime

in

out

flt#

Are relationships entities?

Or, are they just “glue”?


departuretime

• relationships may have attributes

• for 1-N (and 1-1) relationships,

attributes may be moved to the entity

on the “many-side”


Or, are they just “glue”? (cont....)


paymentmethod

• in N-M relationships, the attributes

cannot be moved to the entities

• how can non-entities have

attributes?


Or, are they just “glue”? (cont....)


paymentmethod

customer flt-instanceN M

paymentmethod

reservation1 1

An objectified relationship type

supertypes and subtypes?

passenger

first class

passenger

business class

passenger

economy class

passenger

o

x

UNION entity types

company

payer

person

property or entity type?

customer

family

name

relationship type or entity type?

flt-instance

customer

reservation

Relational Algebra - 2

Few Examples

Selection Returns all tuples which satisfy a condition

Notation: c(R)

Examples

Salary > 40000 (Employee)

name = “Smith” (Employee)

The condition c can be =, <, , >, , <>

[in SQL: SELECT * FROM Employee

WHERE Salary > 40000]

Selection Example

Employee

SSN Name DepartmentID Salary

999999999 John 1 30,000

777777777 Tony 1 32,000

888888888 Alice 2 45,000


888888888 Alice 2 45,000

Find all employees with salary more than $40,000.

Salary > 40000 (Employee)

4. Projection

Eliminates columns, then removes duplicates

Notation: PA1,…,An (R)

Example: project to social-security number and names:

P SSN, Name (Employee)

Output schema: Answer(SSN, Name)

[In SQL: SELECT DISTINCT SSN, Name FROM Employee]

Projection Example

Employee


999999999 John 1 30,000

777777777 Tony 1 32,000

888888888 Alice 2 45,000

SSN Name

999999999 John

777777777 Tony

888888888 Alice

P SSN, Name (Employee)

5. Cartesian Product

Combine each tuple in R1 with each tuple in R2

Notation: R1 R2

Example:

Employee Dependents

Very rare in practice; mainly used to express

joins

[In SQL: SELECT * FROM R1, R2]

Cartesian Product Example

Employee

Name SSN

John 999999999

Tony 777777777

Dependents

EmployeeSSN Dname

999999999 Emily

777777777 Joe

Employee x Dependents

Name SSN EmployeeSSN Dname

John 999999999 999999999 Emily

John 999999999 777777777 Joe

Tony 777777777 999999999 Emily

Tony 777777777 777777777 Joe

Schemes Branch-scheme=(branch-name, assets, branch-city)

Customer-scheme=(customer-name, street,

customer-city)

Deposit-scheme=(branch-name, account-number,

customer-name, balance)

Borrow-scheme=(branch-name, loan-number,

customer-name, amount)

Client-scheme=(customer-name, banker-name)

The selection operation Query: select those tuples of the borrow relation

where the branch name is “Patny”

branch-name=“Patny” (borrow)

Query: find all tuples in which the amount borrowed is more than Rs.1200

amount>1200 (borrow)

Query: find those tuples pertaining to loans of more than Rs.1200 made by the Patny branch

branch-name=“Patny”^ amount>1200 (borrow)

The selection operation

Query: Find all those customers who have

the same name as their personal banker

customer-name=banker-name (client)

The Projection Operation Query: Show customers and the branches

from which they borrow

p branch-name,customer-name (borrow)

Query:Find those customers who have the

same name as their personal banker

p customer-name (customer-name=banker-name (client))

The Cartesian Product Operation

client x customer

The relation scheme would be

(client.customer-name, client.banker-name,

customer.customer-name, customer.street,

customer.customer-city)

The Cartesian Product Operation

Query: Find all clients of banker Johnson

and the city in which they live

banker-name=“Johnson” (client x customer)

End of Unit-I

212