Lecture03(2)
Transcript of Lecture03(2)
-
7/30/2019 Lecture03(2)
1/10
CXB 3104
Advanced Database Systems
Lecture 3
Muthu
1
Relational tables are sets.
Rows of the tables can be considered as
elements of the set
Operations that can be performed on sets
can be done on relational tables.
Relational Data Manipulation
2
Union Operator
The union operation of two relational tables
is formed by appending rows from one table
to those of a second table to produce athird.
Duplicate rows are eliminated.
Tables that are union compatible must have
the same number of columns and
corresponding columns must come from
the same domain.
3
Union Operator
4
-
7/30/2019 Lecture03(2)
2/10
Difference Operator
The difference of two relational tables is a
third that contains those rows that occur in
the first table but not in the second
Requires that the tables be union
compatible.
5
Difference Operator
6
Intersection Operator
The intersection of two relational tables is a
third table that contains common rows.
Requires that the tables be union compatible.
7
Intersection Operator
8
-
7/30/2019 Lecture03(2)
3/10
Product Operator
The product of two relational tables iscalled the Cartesian product.
It is the concatenation of every row in onetable with every row in the second.
The product of table A (having m rows) andtable B (having n rows) is the table C (having
m x n rows).
9
Product Operator
10
Projection & Selection Operators
Projection The project operator retrieves a subset of columns from
a table, removing duplicate rows from the result.
Yields vertical subset of a table
Selection The select operator retrieves subsets of rows from a
relational table based on a value(s) in a column or
columns.
Yields a horizontal subset of a table
11
combines the product, selection, and projection
combines data from one row of a table with rows fromanother or the same table when certain criteria are met.
criteria involves a relationship among the columns in thejoin relational table.
If the join criterion is based on equality of column value,the result is called an equijoin.
A natural join removes redundant columns.
Join Operator
12
-
7/30/2019 Lecture03(2)
4/10
Join Operator
13
Results in columns values in one table for
which there are other matching column
values corresponding to every row in another
table.
Division Operator
14
Data Dictionary
It provides details of all tables found within thedatabase.
It contains all the attribute name and characteristicsfor each table in the system.
The data dictionary contains metadata - data aboutdata
15
Data Dictionary
16
-
7/30/2019 Lecture03(2)
5/10
Normalization
18
Database Tables and Normalization
Normalization
Process for evaluating and correcting table
structures to minimize data redundancies
helps eliminate data anomalies
Works through a series of stages called normal
forms:
Normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
19
Database Tables and Normalization
2NF is better than 1NF; 3NF is better than
2NF
For most business database design purposes,
3NF is highest we need to go in the
normalization process
Highest level of normalization is not always
most desirable
What is Normalisation?
In a RDB normalisation is crucial for:
retaining data consistency on updates
Minimizing data redundancy and
therefore reducing file space required
in the database
Minimize data storage
Key Concepts in normalization are
Functional Dependency and keys
20
-
7/30/2019 Lecture03(2)
6/10
Update Anomalies
Tables that have redundant data may have
problems called update anomalies.
Consider the following table of data on products
and required manufacturing equipment:
Derby
Derby
Rugby
Derby
Westwood
Westwood
Davison
Westwood
Smith
Smith
Jones
Jones
cooler
heater
pump
heater
Ethylene
Ethylene
Styrene
Styrene
Supplier
address
Equipment
supplier
Product
manager
Equipment
name
Product
name
21
Deletion Anomalies
If a row is deleted that represents the
last product with a particular piece of
equipment, the equipment details are
also lost - this is a deletion anomaly.
22
Deletion Anomalies example
23
Derby
Derby
Rugby
Derby
Westwood
Westwood
Davison
Westwood
Smith
Smith
Jones
Jones
cooler
heater
pump
heater
Ethylene
Ethylene
Styrene
Styrene
Supplieraddress
Equipmentsupplier
Productmanager
Equipmentname
Productname
Loss of equipment Supplier Details
Insert Anomalies
Insert anomalies -
New rows that are entered must always
have consistent sets of product and/orequipment - human error may lead to
inconsistencies
Product and equipment data cannot be
entered separately without using null values
- this might violate primary keys
24
-
7/30/2019 Lecture03(2)
7/10
Insert Anomalies example
25
Potential for inconsistent data sets
Modification Anomalies
An update on the values of product or
equipment in one row must also be
performed on all the other rows that
have the same product or equipment, or
inconsistencies will occur in the data -
modification anomaly
26
Modification Anomalies example
27
Normal Forms
Normal Forms are rules developed to
avoid logical inconsistencies from table
update operations.
Each normal form prohibits a form of
redundancy in table organisation that
could yield meaningless results if one
table were updated independently of
other tables or other rows in the table.
28
-
7/30/2019 Lecture03(2)
8/10
Normal Forms
There are multiple levels of normal forms.
Each higher level adds in an additionalconstraint to the level preceding it.
As the database design satisfies higher levelnormal forms the tables become morefragmented. This means that:
As data consistency is improved database navigationand hence queries become slower
The tables become less like the real-world systemthey represent.
29
Normal Forms
The six normal form levels are: 1st Normal Form (1NF)
2nd Normal Form (2NF)
3rd Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
4th Normal Form (4NF)
5th Normal Form (5NF)
Usually ensuring that the database satisfies thethird normal form is sufficient for dataconsistency.
30
First Normal Form- 1
A table is in first normal form when a
primary key can be defined for each
combination of data.
In the example below, a row has two entries in
column Equipment name.
Product
name
Equipment
name
Product
manager
Equipment
supplier
Supplier
address
Ethylene
Styrene
Styrene
cooler, heater
pump
heater
Smith
Jones
Jones
Westworld
Davison
Westworld
Derby
Rugby
Derby
Table violates INF
31
First Normal Form2
This can be remedied by using two rows for the
dual entry:
Product
name
Equipment
name
Product
manager
Equipment
supplier
Supplier
address
Ethylene
Ethylene
Styrene
Styrene
cooler
heater
pump
heater
Smith
Smith
Jones
Jones
Westwood
Westwood
Davison
Westwood
Derby
Derby
Rugby
Derby
Table satisfies INF Table violates 2NF
Primary key: (Product name, Equipment name)
32
-
7/30/2019 Lecture03(2)
9/10
-
7/30/2019 Lecture03(2)
10/10
Third Normal Form - 1
A table is in third normal form when it
satisfies second normal form and each non-
primary key column directly depends on the
primary key.
In the example, 3NF is violated because there
is transitive dependency. Supplier address
depends on Equipment supplier which in turn
depends on the primary key.
37
Third Normal Form2
This can be remedied by splitting off this
indirect dependence into a further table:
Tables satisfy 3NF
Primary key:(Product name, Equipment name)
Product
name
Product
manager
Ethylene
Styrene
Smith
Jones
Primary key:
(Product name)
Equipment
supplier
Supplier
address
Westwood
Davison
Derby
Rugby
Product
name
Equipment
name
Equipment
supplier
Ethylene
Ethylene
Styrene
Styrene
cooler
heater
pump
heater
Westwood
Westwood
Davison
WestwoodPrimary key:
(Equipment supplier)
38
Other Normal Forms
The data in a RDB is free of redundancy when
it is in the fifth normal form.
In this state an update on a column in anytable should not lead to data inconsistencies
occurring.
In practice it is adequate to normalise data
into the 3NF.
39