DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd...

DATABASE DESIGN I - 1DL300

Spring 2011

An introductory course on database systems

http://www.it.uu.se/edu/course/homepage/dbastekn/vt11/

2011-02-16 1Manivsakan Sabesan - UDBL - IT - UU

Manivasakan SabesanUppsala Database Laboratory

Department of Information Technology, Uppsala University, Uppsala, Sweden

Introduction to Normalization

Elmasri/Navathe ch 14Padron-McCarthy/Risch ch 11

2011-02-16 2Manivasakan Sabesan - UDBL - IT - UU

Manivasakan Sabesan

Department of Information Technology

Uppsala University, Uppsala, Sweden

Normalization and problems in schema design

• Normalization - relational database design– Normalization of relational schemas or “we want to have good relations . . .”

– Functional dependencies

– Normal forms

• Problems in schema design

• Problems in schema design – Unclear semantics

– Redundancy• Modification problems (updates, insertions deletions)

– Null values

– Spurious tuples

– Multi-valued dependencies

Clear Semantics of Relations

• Semantics of a relation – Meaning resulting from interpretation of attribute values in a tuple

• Easier to explain semantics of relation– Indicates better schema design

– Indicates better schema design

Guideline 1

• Design relation schema so that it is easy to explain its meaning

• Do not combine attributes from multiple entity types and relationship types into a single relation

Guideline 1 (cont’d.)

Ename Ssn Bdate Address Dnumber Dname Dmgr_ssn

EMP_DEPT

EMP_PROJ

Ssn Pnumber Hours Ename Pname Plocation

EMP_PROJ

Example: design of a relational database

• To design a relational database we start from an E-R model and design a number of relational schemas as e.g.:

• emps(ename, salary, dept, dname, dept#, mgr)• supp_info(sname, iname, saddr, price)

• supp_info(sname, iname, saddr, price)• items(iname, item#, dept)• customers(cname, addr, balance, o#, date)

• includes(o#, iname, quantity)

Ex. continues - bad design causes problems

• We have here chosen to combine the schemas for SUPPLIES and SUPPLIERS and exchange them with one schema that contain all information about suppliers:

SUPPLIERS(SNAME, SADDR)SUPPLIES(SNAME, INAME, PRICE)SUPP_INFO(SNAME, INAME, SADDR, PRICE)

• Due to this decision we will run into several problems with:

• Due to this decision we will run into several problems with:– Unclear semantics– Redundancy– Modification anomalies

• Update anomalies• Insertion anomalies• Deletion anomalies

RedundancySUPP_INFO(SNAME, INAME, SADDR, PRICE)

• Redundancy:– When each supplier supplies several products, the address will be repeated for each

product.

– Storing natural joins of base relations leads to update anomalies

Modification anomalies SUPP_INFO(SNAME, INAME, SADDR, PRICE)

• Update anomaly - inconsistency– When you have redundant data you risk, for example, when updating the address in the

table, to miss updating all occurrences of the redundant values (here several occurrences of address values for the same supplier).

– The result will be that we get inconsistencies among data in the database.

• Insertion anomaly– We can not insert the address for a supplier that do not currently supply any product.

– We can not insert the address for a supplier that do not currently supply any product.• One can of course assign null values INAME and PRICE, but the one must remember to

remove these null values later when additional information exist. Hence, no tractable solution!

– Furthermore, sinceINAME is part of the key for this relation this is not a good solution. It is probably not even allowed to assign null values to key attributes.

• Deletion anomaly– If all tuples are removed, that corresponds to items that a specific supplier provide,

then all information about that supplier will disappear.

How to avoid these problems?

• In order to avoid all these problems, we divide the relation (table) in the following manner:

SUPPLIERS(SNAME, SADDR)SUPPLIES(SNAME, INAME, PRICE)

• Disadvantages: – To retrieve addresses for suppliers that provide bananas, we must perform an

– To retrieve addresses for suppliers that provide bananas, we must perform an expensive join operation, SUPPLIERS * SUPPLIES, that we could get away with if we instead had the relation:SUPP_INFO(SNAME, INAME, SADDR, PRICE)

– The operations SELECT and PROJECT that is much cheaper will do the job in the latter case.

Guideline 2

• Design base relation schemas so that no update anomalies are present in the relations

• If any anomalies are present:– Note them clearly

– Make sure that the programs that update the database will operate correctly

NULL Values in Tuples

• many attributes can end up with many NULLs in a tuple

• Problems with NULLs– Wasted storage space

– Problems understanding meaning

Guideline 3

• Avoid placing attributes in a base relation whose values may frequently be NULL

• If NULLs are unavoidable:– Make sure that they apply in exceptional cases only, not to a majority of tuples

Ename Plocation

Smith, John Uppsala

123456789 1 32.5 Smith, John SSPI Uppsala

123456789 2 7.5 Smith, John Vortex Luleå

453453453 1 20.0 English, Joyce SSPI Uppsala

EMP_PROJ

EMP_LOCS

Spurious Tuples

Smith, John Uppsala

Smith, John Luleå

English, Joyce Uppsala

Ssn Pnumber Hours Pname Plocation

123456789 1 32.5 SSPI Uppsala

123456789 2 7.5 Vortex Luleå

453453453 1 20.0 SSPI Uppsala

EMP_PROJ1

EMP_LOCS

Ssn Pnumber Hours Pname Plocation

123456789 1 32.5 SSPI Uppsala

123456789 2 7.5 Vortex Luleå

453453453 1 20.0 SSPI Uppsala

EMP_PROJ1

Ename Plocation

Smith, John Uppsala

Smith, John Luleå

English, Joyce Uppsala

123456789 2 7.5 Smith, John Vortex Luleå

EMP_PROJ

453453453 1 20.0 SSPI Uppsala

Guideline 4

• Design relation schemas to be applied natural join that are appropriately related – Guarantees that no spurious tuples are generated

• Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations

key) combinations

Normalization

• What principles do we have to follow to systematically end up with a good schema design.– In 1972, Ted Codd defined a set of conditions that put various requirements on a

relation.

– To fulfill these conditions, the relation schema is in several steps divided into smaller schemas.

smaller schemas.

• This process is called normalization.

• We need to study the following concepts for performing the normalization: – Functional dependencies

– Normal forms for relations

Functional dependency (FD)

• Let R be a relation schema with attributes A1, ..., An and let X and Y be subsets of {A1, ..., An}.

• Let r(R) determine a relation (instance) of the schema R.

We say that X functionally determines Y, and we write

X ⇒ Y

• The attribute set Y is said to be dependent of the attribute set X.

• There is a functional dependencybetween X and Y.

X ⇒ Y

if for every pair of tuples t1, t2 ∈ r(R) and for all r(R) the following hold:

If t1[X] = t2[X] it holds that t1[Y] = t2[Y]

Functional dependency - properties

• If X is a candidate key for R we have:– X ⇒ Y for all Y ⊆ {A1, ..., An}.– X ∩ Y do not need to be the empty set.

– Assume that R represents an N:1 relation between the entity types E1 and E2.

– It is further assumed that X ⊆ {A1, ..., An} constitutes a key for E1 and Y constitutes a key for E2,

– Then we have the dependency X ⇒ Y but not Y ⇒ X.

– If R would be a 1:1 relation, then it also holds that Y ⇒ X.

• A functional dependency is a property of R and not of any relation r(R). The semantic for the participating attributes in a relation decides if there exists a FD or not.

Armstrong’s axioms(inference rules)

• 1. If X ⊇ Y, then X → Y (reflexive rule )

• 2. If X → Y, then XZ → YZ (augmentation rule)

• 3. If X → Y and Y → Z, then X → Z (transitive rule)

Additional rules:

• 4. If X → YZ, then X → Y (decomposition, or projection, rule)

• 5. If X → Y and X → Z, then X → YZ (union, or additive, rule)

• 6. If X → Y and WY → Z, then WX → Z (psuedotransitive rule)

Example

• The simplest form of FD is between the key attributes and the other attributes in a schema:

{SNAME} ⇒ {SADDR} SUPPLIERS{SNAME, INAME} ⇒ {PRICE} SUPPLIES{CNAME} ⇒ {CADDR, BALANCE} CUSTOMERS{SNAME} ⇒ {SNAME}

{SNAME} ⇒ {SNAME}

• A slightly more complex FD:{SNAME, INAME} ⇒ {SADDR, PRICE}

– This dependency is the combination of the first two above. We are able to combine these two because we understand the concepts supplier, item, address and price and the relationships among them.

Full functional dependency - FFD

• A functional dependency X ⇒ Y is termed FFD if there is no attribute A ∈ X such that (X- {A}) ⇒ Y holds.

• In other words, a FFD is a dependency that do not contain any unnecessary attributes in its determinant (i.e. the left-hand side of the dependency).

SUPP_INFO(SNAME, INAME, SADDR, PRICE)

{SNAME, INAME} ⇒ {PRICE} FFD{SNAME, INAME} ⇒ {SADDR} Not FFD{SNAME} ⇒ {SADDR} FFDcompare: f(x,y) = 3x + 2 Not FFD

f(x) = 3x + 2 FFD

Prime attribute

• Definition: an attribute that is a member in any of the candidate keys is called a prime attribute of R.

• For example:– In the schema R(CITY,STREET,ZIPCODE) all attributes are prime attribute,

since we have the dependencies CITY,STREET⇒ ZIPCODE and ZIPCODE⇒CITY and both CITY,STREET and STREET,ZIPCODE are keys.

CITY and both CITY,STREET and STREET,ZIPCODE are keys.

– In the schema R(A,B,C,D) with the dependencies AB ⇒ C, B ⇒ D and BC ⇒A, are both AB and BC candidate keys. Therefore are A, B and C prime attribute but not D.

• Attributes that is not part of any candidate key is called non-prime or non-key attributes.

First normal form - 1NF

• Only atomic values are allowed as attribute values in the relational model.– The relations that fulfill this condition are said to be in 1NF

Ssn Ename Pnumber Hours

1234 Smith 1 12

4534 Wong 3 40

Second normal form - 2NF

• A relation schema R is in 2NF if:– It is in 1NF

– Every non-key attribute A in R is FFD of each candidate key in R.

Third normal form - 3NF

• A relation schema R is in 3NF if:– It is in 2NF

– No non-key attribute A in R is allowed to be FFD of any other non-key attribute.

Example

Boyce -Codd normal form - BCNF

• A relation schema R is in BCNF if:– It is in 1NF

– Every determinant X is a candidate key.

• The difference between BCNF and 3NF is that in BCNF, a prime attribute can not be FFD of a non-key (or non-prime) attribute or of a prime attribute (i.e. a partial

key). Therefore BCNF is a more strict condition.

Example_BCNF

Additional normal forms

• There are a number of other normal forms available

• These explores additional data dependencies such as:– multi-valued dependencies (4NF)

– join dependencies (5NF)

Multi-valued dependencies

• Consider the relation classes(course, teacher, book)

course

databasedatabasedatabase

teacher

ToreTore

Martin

ElmasriUllmanElmasri

• The book attribute specifies required literature independent of teacher.

• The relation fulfills BCNF, but still every teacher must be supplied two times.

database database

operating systemoperating system

Martin Martin MartinMartin

ElmasriUllman

SilberschatzShaw

Lossless join

• We decompose a schema R with functional dependencies D into schemas R1 and R2

• R1 and R2 is called a lossless-join decomposition (in relation to D) if R1 * R2 contain the same information as R.– no spurious tuples are generated when we perform the join between the decomposed

relations.

– no information is lost when the relation is decomposed.

3NF/BCNF

• BCNF is a strong condition. It is not always possible to transform (through decomposition) a schema to BCNF and keep the dependencies.

• 3NF has the most of BCNF's advantages and can still be fulfilled without giving up dependencies or the lossless-join property for the relation schema.

• Observe that 3NF admits some sort of redundance that BCNF do not allow. BCNF

• Observe that 3NF admits some sort of redundance that BCNF do not allow. BCNF requires that one eliminates redundance caused by functional dependencies.

Normal forms and normalization• We assume we have an enterprise that buys products from different supplying

companies, and we would like to keep track of our data by means of a database.

• We would like to keep track of what kind of products (e.g. cars) we buy, and from which supplier (e.g. Volvo) we buy them. We can buy each product from several suppliers.

• Further, we want to know the price of the products. The price of a product is naturally dependent on which supplier we bought it from.

• We would also like to store the address of each supplier, i.e. the city where the supplier is located. Here, we assume that each supplier is only located at one place.

• Perhaps we suddenly need to order a large number of cars from Volvo. It can then be good to know how many people that live in the city where Volvo is located. Thus, we can know if Volvo can employ a sufficient amount of people to produce the cars. Hence we also store the population for each city.

DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd...

Documents

Transcript of DATABASE DESIGN I - Uppsala · PDF fileDATABASE DESIGN I -1DL300 Spring 2011 ... Boyce-Codd...

Database Security · PDF fileDatabase Security Guideline Version 2.0 February 1, 2009 Database Security Consortium Security Guideline WG

Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF

M^Verno Signal HS THE WESTER N 1NF[ IN · PDF file. , iwan R favoritesuiion^his associates au

31LT3177 78 Rev - Lumissil · 2020. 9. 29. · IS31LT3177/78 Lumissil Microsystems – 2 Rev. B, 09/09/2020 TYPICAL APPLICATION CIRCUIT Optional COUT 1nF Optional COUT 1nF IS31LT3177

Is Now Part of - ON Semiconductor · trd1-10 trct1 12 trd1+ 11 txct 13 rxct 14 mid+ 15 mid-16 shield 22 shield 21 yellow_c 17 yellow_a 18 green_c 19 green_a 20 6 7 c71 1nf c72 1nf

USER GUIDE and TECHNICAL MANUALc1 cap elec vert 1000uf 25v esr 1 ce0512 ... c5 cap elec vert 1000uf 25v esr 1 ce0512 c6 cap cer 1nf 1kv 5mm pitch ca0055 c7 cap cer 1nf 1kv 5mm pitch

DATABASE DESIGN I · DATABASE DESIGN I -1DL300 Spring 2011 An introductory course on database systems ... Department of Information Technology, Uppsala University, Uppsala, Sweden.

High Availability Solution Database Mirroring in SQL ... · PDF fileDatabase Administration Community ( ) SQL Server Technical Article High Availability Solution Database Mirroring

Database Management Systems: Relational, Object · PDF fileDatabase Management Systems: Relational, Object-Relational, ... 1.10.2 Schema Evolution ... (DBMS) supports a database

By Robert C. Allen Nuffield College Oxford OX1 1NF Email ...

H-1NF: The National Plasma Fusion Research Facility

DATA NORMALIZATION CS 260 Database Systems. Overview Introduction Anomalies Functional dependence Normal forms 1NF 2NF 3NF BCNF Denormalization.

Normalization Sept. 2014ACS-3902 Yangjun Chen1 Outline: Normalization Redundant information and update anomalies Function dependencies Normal forms -1NF,

Standardized Nursing Documentation Templates: Development ... · PDF fileDatabase (VANOD) Describe ... Nurses Skills, tools & environment to support patient care ... Two nationally

Exadata customer case studies - regist-event. · PDF fileDatabase Consolidation 2 Years in Production with Oracle Exadata Valeri Minkov 22.09.2016

Database Administration Oracle Standards · PDF fileDATABASE ADMINISTRATION ORACLE STANDARDS. ... Oracle Database Development Life Cycle ... 2.2 Test Validation Phase

DATABASE DESIGN I - 1DL300 Fall · PDF fileDATABASE DESIGN I - 1DL300 Fall 2009 ... (4NF) –join dependencies (5NF) ... requires that one eliminates redundance caused by functional

MICROSOFT TECHNOLOGY ASSOCIATE Student Study · PDF fileDatabase administration fundamentals review ... understanding multiple Windows application models ... Embedded Development,

Normalisation up to 1NF Bottom-up Approach to Data Modelling.

4 TH NORMAL FORM By: Karen McVay. REVIEW OF NFs 1NF All values of the columns are atomic. That is, they contain no repeating values. 1NF All values.