Lecture03(2)

download Lecture03(2)

of 10

Transcript of Lecture03(2)

  • 7/30/2019 Lecture03(2)

    1/10

    CXB 3104

    Advanced Database Systems

    Lecture 3

    Muthu

    [email protected]

    1

    Relational tables are sets.

    Rows of the tables can be considered as

    elements of the set

    Operations that can be performed on sets

    can be done on relational tables.

    Relational Data Manipulation

    2

    Union Operator

    The union operation of two relational tables

    is formed by appending rows from one table

    to those of a second table to produce athird.

    Duplicate rows are eliminated.

    Tables that are union compatible must have

    the same number of columns and

    corresponding columns must come from

    the same domain.

    3

    Union Operator

    4

  • 7/30/2019 Lecture03(2)

    2/10

    Difference Operator

    The difference of two relational tables is a

    third that contains those rows that occur in

    the first table but not in the second

    Requires that the tables be union

    compatible.

    5

    Difference Operator

    6

    Intersection Operator

    The intersection of two relational tables is a

    third table that contains common rows.

    Requires that the tables be union compatible.

    7

    Intersection Operator

    8

  • 7/30/2019 Lecture03(2)

    3/10

    Product Operator

    The product of two relational tables iscalled the Cartesian product.

    It is the concatenation of every row in onetable with every row in the second.

    The product of table A (having m rows) andtable B (having n rows) is the table C (having

    m x n rows).

    9

    Product Operator

    10

    Projection & Selection Operators

    Projection The project operator retrieves a subset of columns from

    a table, removing duplicate rows from the result.

    Yields vertical subset of a table

    Selection The select operator retrieves subsets of rows from a

    relational table based on a value(s) in a column or

    columns.

    Yields a horizontal subset of a table

    11

    combines the product, selection, and projection

    combines data from one row of a table with rows fromanother or the same table when certain criteria are met.

    criteria involves a relationship among the columns in thejoin relational table.

    If the join criterion is based on equality of column value,the result is called an equijoin.

    A natural join removes redundant columns.

    Join Operator

    12

  • 7/30/2019 Lecture03(2)

    4/10

    Join Operator

    13

    Results in columns values in one table for

    which there are other matching column

    values corresponding to every row in another

    table.

    Division Operator

    14

    Data Dictionary

    It provides details of all tables found within thedatabase.

    It contains all the attribute name and characteristicsfor each table in the system.

    The data dictionary contains metadata - data aboutdata

    15

    Data Dictionary

    16

  • 7/30/2019 Lecture03(2)

    5/10

    Normalization

    18

    Database Tables and Normalization

    Normalization

    Process for evaluating and correcting table

    structures to minimize data redundancies

    helps eliminate data anomalies

    Works through a series of stages called normal

    forms:

    Normal form (1NF)

    Second normal form (2NF)

    Third normal form (3NF)

    19

    Database Tables and Normalization

    2NF is better than 1NF; 3NF is better than

    2NF

    For most business database design purposes,

    3NF is highest we need to go in the

    normalization process

    Highest level of normalization is not always

    most desirable

    What is Normalisation?

    In a RDB normalisation is crucial for:

    retaining data consistency on updates

    Minimizing data redundancy and

    therefore reducing file space required

    in the database

    Minimize data storage

    Key Concepts in normalization are

    Functional Dependency and keys

    20

  • 7/30/2019 Lecture03(2)

    6/10

    Update Anomalies

    Tables that have redundant data may have

    problems called update anomalies.

    Consider the following table of data on products

    and required manufacturing equipment:

    Derby

    Derby

    Rugby

    Derby

    Westwood

    Westwood

    Davison

    Westwood

    Smith

    Smith

    Jones

    Jones

    cooler

    heater

    pump

    heater

    Ethylene

    Ethylene

    Styrene

    Styrene

    Supplier

    address

    Equipment

    supplier

    Product

    manager

    Equipment

    name

    Product

    name

    21

    Deletion Anomalies

    If a row is deleted that represents the

    last product with a particular piece of

    equipment, the equipment details are

    also lost - this is a deletion anomaly.

    22

    Deletion Anomalies example

    23

    Derby

    Derby

    Rugby

    Derby

    Westwood

    Westwood

    Davison

    Westwood

    Smith

    Smith

    Jones

    Jones

    cooler

    heater

    pump

    heater

    Ethylene

    Ethylene

    Styrene

    Styrene

    Supplieraddress

    Equipmentsupplier

    Productmanager

    Equipmentname

    Productname

    Loss of equipment Supplier Details

    Insert Anomalies

    Insert anomalies -

    New rows that are entered must always

    have consistent sets of product and/orequipment - human error may lead to

    inconsistencies

    Product and equipment data cannot be

    entered separately without using null values

    - this might violate primary keys

    24

  • 7/30/2019 Lecture03(2)

    7/10

    Insert Anomalies example

    25

    Potential for inconsistent data sets

    Modification Anomalies

    An update on the values of product or

    equipment in one row must also be

    performed on all the other rows that

    have the same product or equipment, or

    inconsistencies will occur in the data -

    modification anomaly

    26

    Modification Anomalies example

    27

    Normal Forms

    Normal Forms are rules developed to

    avoid logical inconsistencies from table

    update operations.

    Each normal form prohibits a form of

    redundancy in table organisation that

    could yield meaningless results if one

    table were updated independently of

    other tables or other rows in the table.

    28

  • 7/30/2019 Lecture03(2)

    8/10

    Normal Forms

    There are multiple levels of normal forms.

    Each higher level adds in an additionalconstraint to the level preceding it.

    As the database design satisfies higher levelnormal forms the tables become morefragmented. This means that:

    As data consistency is improved database navigationand hence queries become slower

    The tables become less like the real-world systemthey represent.

    29

    Normal Forms

    The six normal form levels are: 1st Normal Form (1NF)

    2nd Normal Form (2NF)

    3rd Normal Form (3NF)

    Boyce-Codd Normal Form (BCNF)

    4th Normal Form (4NF)

    5th Normal Form (5NF)

    Usually ensuring that the database satisfies thethird normal form is sufficient for dataconsistency.

    30

    First Normal Form- 1

    A table is in first normal form when a

    primary key can be defined for each

    combination of data.

    In the example below, a row has two entries in

    column Equipment name.

    Product

    name

    Equipment

    name

    Product

    manager

    Equipment

    supplier

    Supplier

    address

    Ethylene

    Styrene

    Styrene

    cooler, heater

    pump

    heater

    Smith

    Jones

    Jones

    Westworld

    Davison

    Westworld

    Derby

    Rugby

    Derby

    Table violates INF

    31

    First Normal Form2

    This can be remedied by using two rows for the

    dual entry:

    Product

    name

    Equipment

    name

    Product

    manager

    Equipment

    supplier

    Supplier

    address

    Ethylene

    Ethylene

    Styrene

    Styrene

    cooler

    heater

    pump

    heater

    Smith

    Smith

    Jones

    Jones

    Westwood

    Westwood

    Davison

    Westwood

    Derby

    Derby

    Rugby

    Derby

    Table satisfies INF Table violates 2NF

    Primary key: (Product name, Equipment name)

    32

  • 7/30/2019 Lecture03(2)

    9/10

  • 7/30/2019 Lecture03(2)

    10/10

    Third Normal Form - 1

    A table is in third normal form when it

    satisfies second normal form and each non-

    primary key column directly depends on the

    primary key.

    In the example, 3NF is violated because there

    is transitive dependency. Supplier address

    depends on Equipment supplier which in turn

    depends on the primary key.

    37

    Third Normal Form2

    This can be remedied by splitting off this

    indirect dependence into a further table:

    Tables satisfy 3NF

    Primary key:(Product name, Equipment name)

    Product

    name

    Product

    manager

    Ethylene

    Styrene

    Smith

    Jones

    Primary key:

    (Product name)

    Equipment

    supplier

    Supplier

    address

    Westwood

    Davison

    Derby

    Rugby

    Product

    name

    Equipment

    name

    Equipment

    supplier

    Ethylene

    Ethylene

    Styrene

    Styrene

    cooler

    heater

    pump

    heater

    Westwood

    Westwood

    Davison

    WestwoodPrimary key:

    (Equipment supplier)

    38

    Other Normal Forms

    The data in a RDB is free of redundancy when

    it is in the fifth normal form.

    In this state an update on a column in anytable should not lead to data inconsistencies

    occurring.

    In practice it is adequate to normalise data

    into the 3NF.

    39