Normalization and Codd's Rule

Post on 12-Jan-2015

5.753 views 3 download

Tags:

description

Normalization Normal Forms 1 NF 2 NF 3 NF Codd’s Rules

Transcript of Normalization and Codd's Rule

Normalization and Codd’s Rules

n Normalization

n Normal Forms

n 1 NF

n 2 NF

n 3 NF

n Codd’s Rules

Data Normalization

n The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise.

n Achieve a design that is highly flexible

n Reduce redundancy

n Ensure that the design is free of certain update, insertion and deletion anomalies

4NF4NF4NF

BCNFBCNFBCNF

3NF3NF3NF

2NF2NF2NF

Normalization

1NF1NF1NF Flat file Flat file

Partial dependencies removed Partial dependencies removed

Transitive dependencies removed Transitive dependencies removed

Every determinant is a candidate key Every determinant is a candidate key

NonNon--tivialtivial multimulti--valued dependencies valued dependencies removed removed

Stereos To GoInvoice

Order No.

Date: / /

Account No.

Item

Number Product Description/Manufacturer Qty Price

Product

Code

1

2

3

4

5

Date Shipped: / /

Customer:Address:

City State Zip Code

10001

6 15 99

0000-000-0000-0

John Smith2036-26 StreetSacramento CA 95819

SAGX730 Pioneer Remote A/V Receiver

AT10 Cervwin Vega Loudspeakers

CDPC725 Sony Disc-Jockey CD Changer

6 18 99

SubtotalShipping & Handling

Sales TaxTotal

1329851000010306

153291

1

1

1

56995

35995

39995

Go, HogsGo, Hogs

1/051/05

Stereos To Go

0000 000 0000 00000 000 0000 0

John SmithJohn Smith

Unnormalized Relation

How would a program process the data to recreate the invoice?How would a program process the data to recreate the invoice?

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_accountCust_nameCust_name Cust_addrCust_addr Cust_cityCust_city Cust_stateCust_state Zip_code,Zip_code,Item1 Item1_descrip Item1_qty Item1_price,Item1 Item1_descrip Item1_qty Item1_price,Item2 Item2_descrip Item2_qty Item2_price, Item2 Item2_descrip Item2_qty Item2_price, . . . , . . . , Item7 Item7_descrip Item7_qty Item7_price)Item7 Item7_descrip Item7_qty Item7_price)

Unnormalized to 1NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_accountCust_nameCust_name Cust_addrCust_addr Cust_cityCust_city Cust_stateCust_state Zip_code,Zip_code,Item1, Item1_descrip, Item1_qty, Item1_price,Item1, Item1_descrip, Item1_qty, Item1_price,Item2, Item2_descrip, Item2_qty, Item2_price, Item2, Item2_descrip, Item2_qty, Item2_price, . . . , . . . , Item7, Item7_descrip, Item7_qty, Item7_price)Item7, Item7_descrip, Item7_qty, Item7_price)

A flat file places all the data of a transaction into a single record. A flat file places all the data of a transaction into a single rA flat file places all the data of a transaction into a single record. ecord.

This is reminiscent of a COBOL or BASIC program This is reminiscent of a COBOL or BASIC program processing a single transaction with one read statement.processing a single transaction with one read statement.

Repeating groupsRepeating groups

Unnormalized to 1NF

Nominated group of attributes Nominated group of attributes to serve as the keyto serve as the key

(form a unique combination)(form a unique combination)

•• Eliminate the repeating groups.Eliminate the repeating groups.•• Each row retains data for one item.Each row retains data for one item.•• If a person bought 5 items, we If a person bought 5 items, we

would have five would have five tuplestuples

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code,, Zip_code,Item, Item, Item_descripItem_descrip, Item_qty, Item_price), Item_qty, Item_price)

1NF

10001 123456 John Smith ••• SAGX730 Pioneer Remote A/V Rec 1 569.9510001 123456 John Smith 10001 123456 John Smith •••••• SAGX730SAGX730 Pioneer Remote A/V Pioneer Remote A/V RecRec 11 569.95569.95

10001 123456 John Smith ••• AT10 Cerwin Vega Loudspeakers 1 359.9510001 123456 John Smith10001 123456 John Smith •••••• AT10 AT10 CerwinCerwin Vega LoudspeakersVega Loudspeakers 1 359.951 359.95

10001 123456 John Smith ••• CDPC725 Sony Disc Jockey CD 1 399.9510001 123456 John Smith10001 123456 John Smith •••••• CDPC725 CDPC725 Sony Disc Jockey CD Sony Disc Jockey CD 11 399.95399.95

10001 123456 John Smith ••• S/H Shipping 1 100.0010001 123456 John Smith10001 123456 John Smith •••••• S/HS/H Shipping Shipping 11 100.00100.00

10001 123456 John Smith ••• Tax Sales Tax 1 103.0610001 123456 John Smith10001 123456 John Smith •••••• TaxTax Sales Tax Sales Tax 11 103.06103.06

Flat FileFlat File

Invo

ice n

umbe

r

Invo

ice n

umbe

r

Accou

nt n

umbe

r

Accou

nt n

umbe

r

Custo

mer

nam

e

Custo

mer

nam

e

DescriptionDescriptionItem Item

QuantityQuantityItem Item PricePriceItemItem

From 1NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered,

Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code,, Zip_code,ItemItem, , Item_descripItem_descrip, Item_qty, Item_price), Item_qty, Item_price)

Functional dependencies and determinantsFunctional dependencies and determinants

Example: Example: item_descripitem_descrip is functionally dependent on item, is functionally dependent on item, such that item is the determinant of item_descript.such that item is the determinant of item_descript.

From 1NF to 2NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code), Zip_code)

((ItemItem, , Item_descripItem_descrip, Item_qty, Item_price), Item_qty, Item_price)

Is this unique by itself?Is this unique by itself?What happens if the item is purchased more than once?What happens if the item is purchased more than once?

From 1NF to 2NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code), Zip_code)

((Invoice_number,Invoice_number, ItemItem, , Item_descripItem_descrip, Item_qty, Item_price), Item_qty, Item_price)

Composite key (forms a unique combination)Composite key (forms a unique combination)

Partial dependencyPartial dependency

From 1NF to 2NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code), Zip_code)

((Invoice_number,Invoice_number, ItemItem, Item_qty, Item_price), Item_qty, Item_price)

((ItemItem, , Item_descripItem_descrip))

From 2NF to 3NF

Which attributes are dependent on others?Which attributes are dependent on others?Is there a problem?Is there a problem?

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code), Zip_code)

((Invoice_number,Invoice_number, ItemItem, Item_qty, Item_price), Item_qty, Item_price)

((ItemItem, , Item_descripItem_descrip))

Transitive Dependencies and Anomalies

n Insertion anomalies

n To add a new row, all customer (name, address, city, state, zip code, phone) and products (description) must be consistent with previous entries

n Deletion anomalies

n By deleting a row, a customer or product may cease to exist

n Modification anomalies

n To modify a customer’s or product’s data in one row, all modifications must be carried out to all others

Insertion and Modification AnomaliesFor example…

DVD-A110 PanasonicPV-4210 PanasonicPV-4250 Panasonic

DVDDVD--A110A110 PanasonicPanasonicPVPV--42104210 PanasonicPanasonicPVPV--42504250 PanasonicPanasonic

CT-32S35 PANCTCT--32S3532S35 PANPAN

InconsistencyInconsistency

DVD-A110 PanasonicPV-4210 PanaSonicPV-4250 Pana SonicCT-32S35 PAN

DVDDVD--A110A110 PanasonicPanasonicPVPV--42104210 PanaSonicPanaSonicPVPV--42504250 PanaPana SonicSonicCTCT--32S3532S35 PANPAN

Change all Panasonic Change all Panasonic productsproducts’’ manufacturer manufacturer

name to name to ““Panasonic USAPanasonic USA””

Product_codeProduct_code Manufacturer_nameManufacturer_nameInsert a new Panasonic productInsert a new Panasonic product

Deletion AnomalyFor Example…

43771824377182 John SmithJohn Smith llllll SacramentoSacramento CACA 958319583143987114398711 Arnold SArnold S llllll DavisDavis CACA 956919569145784614578461 Gray DavisGray Davis llllll SacramentoSacramento CACA 958319583148731794873179 Lisa CarrLisa Carr llllll RenoReno NVNV 8955789557

By deleting customer Arnold S, we would also be deleting By deleting customer Arnold S, we would also be deleting Davis, California. Davis, California.

Transitive Transitive DependenciesDependencies

Invoice_numberInvoice_number

Invoice_dateInvoice_date

Date_deliveredDate_delivered

Cust_accountCust_account

Cust_nameCust_name

Cust_addrCust_addr

Cust_cityCust_city

Cust_stateCust_state

Zip_codeZip_code

ItemItem

Item_descripItem_descrip

Invoice_number+ItemInvoice_number+Item

Item_qtyItem_qty

Item_priceItem_price

ŸŸ A condition where A, B, C A condition where A, B, C are attributes of a relation are attributes of a relation such that if A such that if A àà B and B and B B àà C, then C is transitively C, then C is transitively dependent on A via B dependent on A via B (provided that A is not (provided that A is not functionally dependent on B functionally dependent on B or C).or C).

Why Should City and State Be Separated from Customer Relation?

n City and state are dependent on zip code for their values and not the customer’s identifier (i.e., key).

Zip_code à City, State

n Otherwise,

Cust_account à Cust_addr, Zip_code à City, State

In which case, you have transitive dependency.

3NF

Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, (Invoice_number, Invoice_date, Date_delivered, Cust_accountCust_account))

Customer RelationCustomer Relation((Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, Zip_code), Zip_code)

Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)

Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)

Items RelationItems Relation(Item, (Item, Item_descripItem_descrip))

3NF

Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, (Invoice_number, Invoice_date, Date_delivered, Cust_accountCust_account))

Customer RelationCustomer Relation((Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, Zip_code), Zip_code)

Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)

Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)

Items RelationItems Relation(Item, (Item, Item_descripItem_descrip))

Since the Items relation contains the manufacturerSince the Items relation contains the manufacturer’’s name in the s name in the description, a separate Manufacturers relation can be createddescription, a separate Manufacturers relation can be created

Manufacturers RelationManufacturers Relation((Manuf_codeManuf_code, , Manuf_nameManuf_name))

First to Third Normal Form(1NF - 3NF)

n 1NF: A relation is in first normal form if and only if every attribute is single-valued for each tuple(remove the repeating or multi-value attributes and create a flat file)

n 2NF: A relation is in second normal form if and only if it is in first normal form and the nonkeyattributes are fully functionally dependent on the key (remove partial dependencies)

n 3NF: A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key (remove transitive dependencies)

Codd's Rules

E. F. Codd presented these rules as a basis of determining whether a DBMS

could be classified as Relational

Codd's Rules

n Codd's Rules can be divided into 5 functional areas –

n Foundation Rules

n Structural Rules

n Integrity Rules

n Data Manipulation Rules

n Data Independence Rules

Foundation Rules

n Rule 0 –

n Any system claimed to be a RDBMS must be able to manage databases entirely through its relational capabilities.

n All data definition & manipulation must be able to be done through relational ops.

n Rule 12 - Nonsubversion Rule -

n If a RDBMS has a low level (record at a time) language, that low level language cannot be used to subvert or bypass the integrity rules &constraints expressed in the higher-level relational language.n All database access must be controlled through the

DBMS so that the integrity of the database cannot be compromised without the knowledge of the user or the DBA.n This does not prohibit use of record at a time languages e.g.

PL/SQL

Foundation Rules

Codd's Rules

n Structural Rules (Rules 1 & 6)

n The fundamental structural construct is the table.

n Codd states that an RDBMS must support tables, domains, primary & foreign keys.

n Each table should have a primary key.

Structural Rules

n Rule 1 -

n All info in a RDB is represented explicitly at the logical level in exactly one way - by values in a table.

n ALL info even the Metadata held in the system catalogue MUST be stored as relations(tables) & manipulated in the same way as data.

n Rule 6 - View Updating –

n All views that are theoretically updatable are updatable by the system.

n Not really implemented yet by any available system.

Structural Rules

Codd's Rules

n Integrity Rules (Rules 3 & 10)

n Integrity should be maintained by the DBMS not the application.

n Rule 3 - Systematic treatment of null values -

n Null values are supported for representation of 'missing' & inapplicable information in a systematic way & independent of data type.

Integrity Rules

n Rule 10 - Integrity independence -

n Integrity constraints specific to a particular RDB MUST be definable in the relational data sublanguage & storable in the DB, NOT the application program.

n This gives the advantage of centralised control & enforcement

Codd's Rules

n Data Manipulation Rules (Rule 2, 4, 5 & 7)

n User should be able to manipulate the 'Logical View' of the data with no need for knowledge of how it is Physically stored or accessed.

n Rule 2 - Guaranteed Access -

n Each & every datum in an RDB is guaranteed to be logically accessible by a combination of table name, primary key value & column name.

Data Manipulation Rules

n Rule 4 - Dynamic on-line Catalog based on relational model

n The DB description (metadata) is represented at logical level in the same way as ordinary data, so that same relational language can be used to interrogate the metadata as regular data.

n System & other data stored & manipulated in the same way.

Data Manipulation Rules

n Rule 5 - Comprehensive Data Sublanguage -n RDBMS may support many languages & modes of

use, but there must be at least ONE language whose statements can express ALL of the following -n Data Definitionn View Definitionn Data manipulation (interactive & via program)n Integrity constraintsn Authorization n Transaction boundaries (begin, commit & rollback)

n 1992 - ISO standard for SQL provides all these functions

Data Manipulation Rules

n Rule 7 - High-level insert, update & delete -

n Capability of handling a base table or view as a single operand applies not only to data retrieval but also to insert, update & delete operations.

Codd's Rules

n Data Independence Rules (Rules 8, 9 11)

n These rules protect users & application developers from having to change the applications following any low-level reorganisation of the DB.

Data Independence Rules

n Rule 8 - Physical Data Independence -

n Application Programs & Terminal Activities remain logically unimpaired whenever any changes are made either to the storage organisation or access methods.

n Rule 9 - Logical Data Independence -

n Appn Progs & Terminal Acts remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.

Data Independence Rules

n Rule 11 - Distribution Independence -

n The data manipulation sublanguage of an RDBMS must enable application programs & queries to remain logically unchanged whether & whenever data is physically centralised or distributed.

Data Independence Rules

n Rule 11 - Distribution Independence -

n This means that an Application Program that accesses the DBMS on a single computer should also work ,without modification, even if the data is moved from one computer to another in a network environment.

n The user should 'see' one centralised DB whether data is located on one or more computers.

Data Independence Rules

n Rule 11 - Distribution Independence –

n This rule does not say that to be fully Relational the DBMS must support distributed DB's but that if it does the query must remain the same.

Summary

n Codd's Rules can be divided into 5 functional areas –

n Foundation Rules

n Structural Rules

n Integrity Rules

n Data Manipulation Rules

n Data Independence Rules