Normalization and Codd's Rule

42
Normalization and Codds Rules

description

Normalization Normal Forms 1 NF 2 NF 3 NF Codd’s Rules

Transcript of Normalization and Codd's Rule

Page 1: Normalization and Codd's Rule

Normalization and Codd’s Rules

Page 2: Normalization and Codd's Rule

n Normalization

n Normal Forms

n 1 NF

n 2 NF

n 3 NF

n Codd’s Rules

Page 3: Normalization and Codd's Rule

Data Normalization

n The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise.

n Achieve a design that is highly flexible

n Reduce redundancy

n Ensure that the design is free of certain update, insertion and deletion anomalies

Page 4: Normalization and Codd's Rule

4NF4NF4NF

BCNFBCNFBCNF

3NF3NF3NF

2NF2NF2NF

Normalization

1NF1NF1NF Flat file Flat file

Partial dependencies removed Partial dependencies removed

Transitive dependencies removed Transitive dependencies removed

Every determinant is a candidate key Every determinant is a candidate key

NonNon--tivialtivial multimulti--valued dependencies valued dependencies removed removed

Page 5: Normalization and Codd's Rule

Stereos To GoInvoice

Order No.

Date: / /

Account No.

Item

Number Product Description/Manufacturer Qty Price

Product

Code

1

2

3

4

5

Date Shipped: / /

Customer:Address:

City State Zip Code

10001

6 15 99

0000-000-0000-0

John Smith2036-26 StreetSacramento CA 95819

SAGX730 Pioneer Remote A/V Receiver

AT10 Cervwin Vega Loudspeakers

CDPC725 Sony Disc-Jockey CD Changer

6 18 99

SubtotalShipping & Handling

Sales TaxTotal

1329851000010306

153291

1

1

1

56995

35995

39995

Go, HogsGo, Hogs

1/051/05

Stereos To Go

0000 000 0000 00000 000 0000 0

John SmithJohn Smith

Page 6: Normalization and Codd's Rule

Unnormalized Relation

How would a program process the data to recreate the invoice?How would a program process the data to recreate the invoice?

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_accountCust_nameCust_name Cust_addrCust_addr Cust_cityCust_city Cust_stateCust_state Zip_code,Zip_code,Item1 Item1_descrip Item1_qty Item1_price,Item1 Item1_descrip Item1_qty Item1_price,Item2 Item2_descrip Item2_qty Item2_price, Item2 Item2_descrip Item2_qty Item2_price, . . . , . . . , Item7 Item7_descrip Item7_qty Item7_price)Item7 Item7_descrip Item7_qty Item7_price)

Page 7: Normalization and Codd's Rule

Unnormalized to 1NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_accountCust_nameCust_name Cust_addrCust_addr Cust_cityCust_city Cust_stateCust_state Zip_code,Zip_code,Item1, Item1_descrip, Item1_qty, Item1_price,Item1, Item1_descrip, Item1_qty, Item1_price,Item2, Item2_descrip, Item2_qty, Item2_price, Item2, Item2_descrip, Item2_qty, Item2_price, . . . , . . . , Item7, Item7_descrip, Item7_qty, Item7_price)Item7, Item7_descrip, Item7_qty, Item7_price)

A flat file places all the data of a transaction into a single record. A flat file places all the data of a transaction into a single rA flat file places all the data of a transaction into a single record. ecord.

This is reminiscent of a COBOL or BASIC program This is reminiscent of a COBOL or BASIC program processing a single transaction with one read statement.processing a single transaction with one read statement.

Repeating groupsRepeating groups

Page 8: Normalization and Codd's Rule

Unnormalized to 1NF

Nominated group of attributes Nominated group of attributes to serve as the keyto serve as the key

(form a unique combination)(form a unique combination)

•• Eliminate the repeating groups.Eliminate the repeating groups.•• Each row retains data for one item.Each row retains data for one item.•• If a person bought 5 items, we If a person bought 5 items, we

would have five would have five tuplestuples

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code,, Zip_code,Item, Item, Item_descripItem_descrip, Item_qty, Item_price), Item_qty, Item_price)

Page 9: Normalization and Codd's Rule

1NF

10001 123456 John Smith ••• SAGX730 Pioneer Remote A/V Rec 1 569.9510001 123456 John Smith 10001 123456 John Smith •••••• SAGX730SAGX730 Pioneer Remote A/V Pioneer Remote A/V RecRec 11 569.95569.95

10001 123456 John Smith ••• AT10 Cerwin Vega Loudspeakers 1 359.9510001 123456 John Smith10001 123456 John Smith •••••• AT10 AT10 CerwinCerwin Vega LoudspeakersVega Loudspeakers 1 359.951 359.95

10001 123456 John Smith ••• CDPC725 Sony Disc Jockey CD 1 399.9510001 123456 John Smith10001 123456 John Smith •••••• CDPC725 CDPC725 Sony Disc Jockey CD Sony Disc Jockey CD 11 399.95399.95

10001 123456 John Smith ••• S/H Shipping 1 100.0010001 123456 John Smith10001 123456 John Smith •••••• S/HS/H Shipping Shipping 11 100.00100.00

10001 123456 John Smith ••• Tax Sales Tax 1 103.0610001 123456 John Smith10001 123456 John Smith •••••• TaxTax Sales Tax Sales Tax 11 103.06103.06

Flat FileFlat File

Invo

ice n

umbe

r

Invo

ice n

umbe

r

Accou

nt n

umbe

r

Accou

nt n

umbe

r

Custo

mer

nam

e

Custo

mer

nam

e

DescriptionDescriptionItem Item

QuantityQuantityItem Item PricePriceItemItem

Page 10: Normalization and Codd's Rule

From 1NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered,

Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code,, Zip_code,ItemItem, , Item_descripItem_descrip, Item_qty, Item_price), Item_qty, Item_price)

Functional dependencies and determinantsFunctional dependencies and determinants

Example: Example: item_descripitem_descrip is functionally dependent on item, is functionally dependent on item, such that item is the determinant of item_descript.such that item is the determinant of item_descript.

Page 11: Normalization and Codd's Rule

From 1NF to 2NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code), Zip_code)

((ItemItem, , Item_descripItem_descrip, Item_qty, Item_price), Item_qty, Item_price)

Is this unique by itself?Is this unique by itself?What happens if the item is purchased more than once?What happens if the item is purchased more than once?

Page 12: Normalization and Codd's Rule

From 1NF to 2NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code), Zip_code)

((Invoice_number,Invoice_number, ItemItem, , Item_descripItem_descrip, Item_qty, Item_price), Item_qty, Item_price)

Composite key (forms a unique combination)Composite key (forms a unique combination)

Partial dependencyPartial dependency

Page 13: Normalization and Codd's Rule

From 1NF to 2NF

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code), Zip_code)

((Invoice_number,Invoice_number, ItemItem, Item_qty, Item_price), Item_qty, Item_price)

((ItemItem, , Item_descripItem_descrip))

Page 14: Normalization and Codd's Rule

From 2NF to 3NF

Which attributes are dependent on others?Which attributes are dependent on others?Is there a problem?Is there a problem?

((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, , Cust_cityCust_city, , Cust_stateCust_state, Zip_code), Zip_code)

((Invoice_number,Invoice_number, ItemItem, Item_qty, Item_price), Item_qty, Item_price)

((ItemItem, , Item_descripItem_descrip))

Page 15: Normalization and Codd's Rule

Transitive Dependencies and Anomalies

n Insertion anomalies

n To add a new row, all customer (name, address, city, state, zip code, phone) and products (description) must be consistent with previous entries

n Deletion anomalies

n By deleting a row, a customer or product may cease to exist

n Modification anomalies

n To modify a customer’s or product’s data in one row, all modifications must be carried out to all others

Page 16: Normalization and Codd's Rule

Insertion and Modification AnomaliesFor example…

DVD-A110 PanasonicPV-4210 PanasonicPV-4250 Panasonic

DVDDVD--A110A110 PanasonicPanasonicPVPV--42104210 PanasonicPanasonicPVPV--42504250 PanasonicPanasonic

CT-32S35 PANCTCT--32S3532S35 PANPAN

InconsistencyInconsistency

DVD-A110 PanasonicPV-4210 PanaSonicPV-4250 Pana SonicCT-32S35 PAN

DVDDVD--A110A110 PanasonicPanasonicPVPV--42104210 PanaSonicPanaSonicPVPV--42504250 PanaPana SonicSonicCTCT--32S3532S35 PANPAN

Change all Panasonic Change all Panasonic productsproducts’’ manufacturer manufacturer

name to name to ““Panasonic USAPanasonic USA””

Product_codeProduct_code Manufacturer_nameManufacturer_nameInsert a new Panasonic productInsert a new Panasonic product

Page 17: Normalization and Codd's Rule

Deletion AnomalyFor Example…

43771824377182 John SmithJohn Smith llllll SacramentoSacramento CACA 958319583143987114398711 Arnold SArnold S llllll DavisDavis CACA 956919569145784614578461 Gray DavisGray Davis llllll SacramentoSacramento CACA 958319583148731794873179 Lisa CarrLisa Carr llllll RenoReno NVNV 8955789557

By deleting customer Arnold S, we would also be deleting By deleting customer Arnold S, we would also be deleting Davis, California. Davis, California.

Page 18: Normalization and Codd's Rule

Transitive Transitive DependenciesDependencies

Invoice_numberInvoice_number

Invoice_dateInvoice_date

Date_deliveredDate_delivered

Cust_accountCust_account

Cust_nameCust_name

Cust_addrCust_addr

Cust_cityCust_city

Cust_stateCust_state

Zip_codeZip_code

ItemItem

Item_descripItem_descrip

Invoice_number+ItemInvoice_number+Item

Item_qtyItem_qty

Item_priceItem_price

ŸŸ A condition where A, B, C A condition where A, B, C are attributes of a relation are attributes of a relation such that if A such that if A àà B and B and B B àà C, then C is transitively C, then C is transitively dependent on A via B dependent on A via B (provided that A is not (provided that A is not functionally dependent on B functionally dependent on B or C).or C).

Page 19: Normalization and Codd's Rule

Why Should City and State Be Separated from Customer Relation?

n City and state are dependent on zip code for their values and not the customer’s identifier (i.e., key).

Zip_code à City, State

n Otherwise,

Cust_account à Cust_addr, Zip_code à City, State

In which case, you have transitive dependency.

Page 20: Normalization and Codd's Rule

3NF

Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, (Invoice_number, Invoice_date, Date_delivered, Cust_accountCust_account))

Customer RelationCustomer Relation((Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, Zip_code), Zip_code)

Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)

Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)

Items RelationItems Relation(Item, (Item, Item_descripItem_descrip))

Page 21: Normalization and Codd's Rule

3NF

Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, (Invoice_number, Invoice_date, Date_delivered, Cust_accountCust_account))

Customer RelationCustomer Relation((Cust_accountCust_account, , Cust_nameCust_name, , Cust_addrCust_addr, Zip_code), Zip_code)

Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)

Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)

Items RelationItems Relation(Item, (Item, Item_descripItem_descrip))

Since the Items relation contains the manufacturerSince the Items relation contains the manufacturer’’s name in the s name in the description, a separate Manufacturers relation can be createddescription, a separate Manufacturers relation can be created

Manufacturers RelationManufacturers Relation((Manuf_codeManuf_code, , Manuf_nameManuf_name))

Page 22: Normalization and Codd's Rule
Page 23: Normalization and Codd's Rule

First to Third Normal Form(1NF - 3NF)

n 1NF: A relation is in first normal form if and only if every attribute is single-valued for each tuple(remove the repeating or multi-value attributes and create a flat file)

n 2NF: A relation is in second normal form if and only if it is in first normal form and the nonkeyattributes are fully functionally dependent on the key (remove partial dependencies)

n 3NF: A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key (remove transitive dependencies)

Page 24: Normalization and Codd's Rule

Codd's Rules

E. F. Codd presented these rules as a basis of determining whether a DBMS

could be classified as Relational

Page 25: Normalization and Codd's Rule

Codd's Rules

n Codd's Rules can be divided into 5 functional areas –

n Foundation Rules

n Structural Rules

n Integrity Rules

n Data Manipulation Rules

n Data Independence Rules

Page 26: Normalization and Codd's Rule

Foundation Rules

n Rule 0 –

n Any system claimed to be a RDBMS must be able to manage databases entirely through its relational capabilities.

n All data definition & manipulation must be able to be done through relational ops.

Page 27: Normalization and Codd's Rule

n Rule 12 - Nonsubversion Rule -

n If a RDBMS has a low level (record at a time) language, that low level language cannot be used to subvert or bypass the integrity rules &constraints expressed in the higher-level relational language.n All database access must be controlled through the

DBMS so that the integrity of the database cannot be compromised without the knowledge of the user or the DBA.n This does not prohibit use of record at a time languages e.g.

PL/SQL

Foundation Rules

Page 28: Normalization and Codd's Rule

Codd's Rules

n Structural Rules (Rules 1 & 6)

n The fundamental structural construct is the table.

n Codd states that an RDBMS must support tables, domains, primary & foreign keys.

n Each table should have a primary key.

Page 29: Normalization and Codd's Rule

Structural Rules

n Rule 1 -

n All info in a RDB is represented explicitly at the logical level in exactly one way - by values in a table.

n ALL info even the Metadata held in the system catalogue MUST be stored as relations(tables) & manipulated in the same way as data.

Page 30: Normalization and Codd's Rule

n Rule 6 - View Updating –

n All views that are theoretically updatable are updatable by the system.

n Not really implemented yet by any available system.

Structural Rules

Page 31: Normalization and Codd's Rule

Codd's Rules

n Integrity Rules (Rules 3 & 10)

n Integrity should be maintained by the DBMS not the application.

n Rule 3 - Systematic treatment of null values -

n Null values are supported for representation of 'missing' & inapplicable information in a systematic way & independent of data type.

Page 32: Normalization and Codd's Rule

Integrity Rules

n Rule 10 - Integrity independence -

n Integrity constraints specific to a particular RDB MUST be definable in the relational data sublanguage & storable in the DB, NOT the application program.

n This gives the advantage of centralised control & enforcement

Page 33: Normalization and Codd's Rule

Codd's Rules

n Data Manipulation Rules (Rule 2, 4, 5 & 7)

n User should be able to manipulate the 'Logical View' of the data with no need for knowledge of how it is Physically stored or accessed.

n Rule 2 - Guaranteed Access -

n Each & every datum in an RDB is guaranteed to be logically accessible by a combination of table name, primary key value & column name.

Page 34: Normalization and Codd's Rule

Data Manipulation Rules

n Rule 4 - Dynamic on-line Catalog based on relational model

n The DB description (metadata) is represented at logical level in the same way as ordinary data, so that same relational language can be used to interrogate the metadata as regular data.

n System & other data stored & manipulated in the same way.

Page 35: Normalization and Codd's Rule

Data Manipulation Rules

n Rule 5 - Comprehensive Data Sublanguage -n RDBMS may support many languages & modes of

use, but there must be at least ONE language whose statements can express ALL of the following -n Data Definitionn View Definitionn Data manipulation (interactive & via program)n Integrity constraintsn Authorization n Transaction boundaries (begin, commit & rollback)

n 1992 - ISO standard for SQL provides all these functions

Page 36: Normalization and Codd's Rule

Data Manipulation Rules

n Rule 7 - High-level insert, update & delete -

n Capability of handling a base table or view as a single operand applies not only to data retrieval but also to insert, update & delete operations.

Page 37: Normalization and Codd's Rule

Codd's Rules

n Data Independence Rules (Rules 8, 9 11)

n These rules protect users & application developers from having to change the applications following any low-level reorganisation of the DB.

Page 38: Normalization and Codd's Rule

Data Independence Rules

n Rule 8 - Physical Data Independence -

n Application Programs & Terminal Activities remain logically unimpaired whenever any changes are made either to the storage organisation or access methods.

n Rule 9 - Logical Data Independence -

n Appn Progs & Terminal Acts remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.

Page 39: Normalization and Codd's Rule

Data Independence Rules

n Rule 11 - Distribution Independence -

n The data manipulation sublanguage of an RDBMS must enable application programs & queries to remain logically unchanged whether & whenever data is physically centralised or distributed.

Page 40: Normalization and Codd's Rule

Data Independence Rules

n Rule 11 - Distribution Independence -

n This means that an Application Program that accesses the DBMS on a single computer should also work ,without modification, even if the data is moved from one computer to another in a network environment.

n The user should 'see' one centralised DB whether data is located on one or more computers.

Page 41: Normalization and Codd's Rule

Data Independence Rules

n Rule 11 - Distribution Independence –

n This rule does not say that to be fully Relational the DBMS must support distributed DB's but that if it does the query must remain the same.

Page 42: Normalization and Codd's Rule

Summary

n Codd's Rules can be divided into 5 functional areas –

n Foundation Rules

n Structural Rules

n Integrity Rules

n Data Manipulation Rules

n Data Independence Rules