2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information...

65
2005.10.11 - SLIDE 1 IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational Model
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information...

Page 1: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 1IS 202 – FALL 2005

Prof. Ray Larson

UC Berkeley SIMS

SIMS 202:

Information Organization

and Retrieval

Normalization & The Relational Model

Page 2: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 2IS 202 – FALL 2005

Lecture Overview

• Review– Databases and Database Design

– Database Life Cycle

– ER Diagrams

– Database Design

• Relational Operations

• Normalization

• Discussion Questions

Page 3: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 3IS 202 – FALL 2005

Lecture Overview

• Review– Databases and Database Design

– Database Life Cycle

– ER Diagrams

– Database Design

• Relational Operations

• Normalization

• Discussion Questions

Page 4: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 4IS 202 – FALL 2005

Models (1)

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 5: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 5IS 202 – FALL 2005

Database System Life Cycle

Growth,Change, &

Maintenance6

Operations5

Integration4

Design1

Conversion3

PhysicalCreation

2

Page 6: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 6IS 202 – FALL 2005

Another View of the Life Cycle

Operations5

Conversion3

PhysicalCreation

2Growth, Change

6

Integration4

Design1

Page 7: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 7IS 202 – FALL 2005

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 8: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 8IS 202 – FALL 2005

Entity

• An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information– Persons (e.g.: customers in a business,

employees, authors)– Things (e.g.: purchase orders, meetings,

parts, companies)

Employee

Page 9: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 9IS 202 – FALL 2005

Attributes

• Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (This is the Metadata for the entities)

Employee

Last

Middle

First

Name SSN

Age

Birthdate

Projects

Page 10: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 10IS 202 – FALL 2005

Relationships

• Relationships are the associations between entities

• They can involve one or more entities and belong to particular relationship types– One to One– One to Many– Many to Many

Page 11: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 11IS 202 – FALL 2005

Relationships

ClassAttendsStudent

PartSuppliesproject parts

Supplier

Project

Page 12: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 12IS 202 – FALL 2005

Types of Relationships

• Concerned only with cardinality of relationship

TruckAssignedEmployee

ProjectAssignedEmployee

ProjectAssignedEmployee

1 1

n

n

1

m

Chen ER notation

Page 13: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 13IS 202 – FALL 2005

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 14: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 14IS 202 – FALL 2005

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 15: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 15IS 202 – FALL 2005

Requirements Analysis

• Conceptual Requirements– Systems Analysis Process

• Examine all of the information sources used in existing applications

• Identify the characteristics of each data element– Numeric– Text– Date/time– Etc.

• Examine the tasks carried out using the information

• Examine results or reports created using the information

Page 16: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 16IS 202 – FALL 2005

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 17: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 17IS 202 – FALL 2005

Conceptual Design

• Conceptual Model– Merge the collective needs of all applications– Determine what Entities are being used

• Some object about which information is to maintained

– What are the Attributes of those entities?• Properties or characteristics of the entity• What attributes uniquely identify the entity

– What are the Relationships between entities• How the entities interact with each other?

Page 18: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 18IS 202 – FALL 2005

Developing a Conceptual Model

• Overall view of the database that integrates all the needed information discovered during the requirements analysis

• Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details

• Can also be represented using other modeling tools (such as UML)

Page 19: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 19IS 202 – FALL 2005

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 20: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 20IS 202 – FALL 2005

Logical Design

• Logical Model– How is each entity and relationship

represented in the Data Model of the DBMS• Hierarchic?• Network?• Relational?• Object-Oriented?

Page 21: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 21IS 202 – FALL 2005

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 22: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 22IS 202 – FALL 2005

Physical Design

• Internal Model– Choices of index file structure– Choices of data storage formats– Choices of disk layout

Page 23: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 23IS 202 – FALL 2005

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

Page 24: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 24IS 202 – FALL 2005

Database Application Design

• External Model– User views of the integrated database – Making the old (or updated) applications work

with the new database design

Page 25: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 25IS 202 – FALL 2005

Lecture Overview

• Review– Databases and Database Design

– Database Life Cycle

– ER Diagrams

– Database Design

• Relational Operations

• Normalization

• Discussion Questions

Page 26: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 26IS 202 – FALL 2005

Relational Algebra Operations

• Restrict

• Project

• Product

• Union

• Intersect

• Difference

• Join

• Divide

Page 27: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 27IS 202 – FALL 2005

Restrict

• Extracts specified tuples (rows) from a specified relation (table) – Restrict is AKA “Select”

Page 28: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 28IS 202 – FALL 2005

Project

• Extracts specified attributes(columns) from a specified relation.

Page 29: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 29IS 202 – FALL 2005

Join

• Builds a relation from two specified relations consisting of all possible concatenated pairs, one from each of the two relations, such that in each pair the two tuples satisfy some condition. (E.g., equal values in a given col.)

A1 B1A2 B1A3 B2

B1 C1B2 C2B3 C3

A1 B1 C1A2 B1 C1A3 B2 C2

(Naturalor Inner)

Join

Page 30: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 30IS 202 – FALL 2005

ER Diagram: Acme Widget Co.

Contains Part

Part# Count

Price

Customer

Quantity

Orders

Cust#

Invoice

Writes

Sales-Rep

Invoice#

Sales

Rep#

Line-ItemContains

Part#

Invoice#

Cust#

Hourly

Employee

ISA

Emp#Wage

Page 31: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 31IS 202 – FALL 2005

Join Items for Relational DBPart # Name Price Count

1 Big blue widget 3.76 22 Small blue Widget 7.35 43 Tiny red widget 5.25 74 large red widget 157.23 235 double widget rack 10.44 126 Small green Widget 30.45 587 Big yellow widget 7.96 18 Tiny orange widget 81.75 429 Big purple widget 55.99 9

Invoice# Part# Quantity93774 3 1084747 23 188647 75 288367 4 3

776879 22 565689 76 1293774 23 1088367 34 2

Invoice # Cust # Rep #93774 3 184747 4 188367 5 288647 9 1

776879 2 265689 6 2

Cust # COMPANY STREET1 STREET2 CITY STATE ZIPCODE

1Integrated Standards Ltd. 35 Broadway Floor 12 New York NY 02111

2 MegaInt Inc. 34 Bureaucracy Plaza Floors 1-172 Phildelphia PA 03756

3 Cyber Associates3 Control Elevation Place

Cyber Assicates Center Cyberoid NY 08645

4General Consolidated 35 Libra Plaza Nashua NH 09242

5Consolidated MultiCorp 1 Broadway Middletown IN 32467

6Internet Behometh Ltd. 88 Oligopoly Place Sagrado TX 78798

7Consolidated Brands, Inc.

3 Independence Parkway Rivendell CA 93456

8 Little Mighty Micro 34 Last One Drive Orinda CA 94563

9 SportLine Ltd. 38 Champion Place Suite 882 Compton CA 95328

Line_item Parts

Customer

Invoice

Page 32: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 32IS 202 – FALL 2005

Relational Operations

• What is the name of the customer who ordered Large Red Widgets?– Restrict “large red widget” row from Part as

temp1– Join temp1 with Line-item on Part # as temp2– Join temp2 with Invoice on Invoice # as temp3– Join temp3 with Customer on cust # as temp4– Project Company from temp4 as answer

Page 33: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 33IS 202 – FALL 2005

SQL

• Database Definition and Querying– Can be used as an interactive query language– Can be imbedded in programs

• Relational Calculus combines Restrict, Project and Join operations in a single command: SELECT

Page 34: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 34IS 202 – FALL 2005

SELECT

• Syntax:

SELECT [DISTINCT] attr1, attr2,…, attr3 FROM rel1 r1, rel2 r2,… rel3 r3 WHERE condition1 {AND | OR} condition2 ORDER BY attr1 [DESC], attr3 [DESC]

Page 35: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 35IS 202 – FALL 2005

SQL SELECT

SELECT c.COMPANY FROM Customer c, Parts p, Invoice i, Line_Items z

WHERE c.Cust# = i.Cust#

AND i.Invoice# = z.Invoice#

AND z.Part# = p.Part#

AND p.Name = “large red widget”;

Page 36: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 36IS 202 – FALL 2005

Lecture Overview

• Review– Databases and Database Design

– Database Life Cycle

– ER Diagrams

– Database Design

• Relational Operations

• Normalization

• Discussion Questions

Page 37: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 37IS 202 – FALL 2005

Normalization

• Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data

• Normalization is a multi-step process beginning with an “unnormalized” relation

Page 38: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 38IS 202 – FALL 2005

Normal Forms

• First Normal Form (1NF)

• Second Normal Form (2NF)

• Third Normal Form (3NF)

• Boyce-Codd Normal Form (BCNF)

• Fourth Normal Form (4NF)

• Fifth Normal Form (5NF)

Page 39: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 39IS 202 – FALL 2005

Normalization

Boyce-Codd and

Higher

Functional dependencyof nonkey attributes on the primary key - Atomic values only

Full Functional dependencyof nonkey attributes on the primary key

No transitive dependency between nonkey attributes

All determinants are candidate keys - Single multivalued dependency

Page 40: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 40IS 202 – FALL 2005

Unnormalized Relations

• First step in normalization is to convert the data into a two-dimensional table

• In unnormalized relations data can repeat within a column

• (The following is a highly contrived example that has only a very vague resemblance to the implementation of the Phone/Photo project database from IS202 in 2004 …)

Page 41: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 41IS 202 – FALL 2005

Unnormalized RelationsPerson # People # Picture date Person Name Person Type Location People Activity Objects Object_Feat

1111145 311

Oct 1, 2003; Nov 12, 2003 John White Student

San Francisco, Berkeley

Beth Little Michael Diamond

Shopping; Eating

Book bag; Pasta

Blue none

1234243 467

Sep 25, 2003; Oct 10, 2003 Mary Jones Auditor

202 South Hall; Oakland

Charles Field Patricia Gold

Reading; Drinking

Textbook; Teacup

None; Chinese

2345 189Sep 27, 2003 Charles Brown Student

Sather Gate

David Rosen Singing none none

4876 145Nov 5, 2003 Hal Kane Student Northside Beth Little Shopping Book bag Blue

5123 145Oct 10, 2003 Paul Kosher Student South Hall Beth Little Reading none none

6845 243

Oct 5, 2003 Dec 15, 2003 Ann Hood Student

Oakland; Oakland

Charles Field; Charles Field

Eating; Shopping

Burrito; none

vegetarian; none

Page 42: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 42IS 202 – FALL 2005

First Normal Form

• To move to First Normal Form a relation must contain only atomic values at each row and column– No repeating groups– A column or set of columns is called a

Candidate Key when its values can uniquely identify the row in the relation

Page 43: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 43IS 202 – FALL 2005

First Normal Form

Person # People # Picture DatePerson Name Person Type Location People Activity Objects Object_feat

1111 145 Oct 1, 2003 John White StudentSan Francisco Beth Little Shopping Book bag Blue

1111 311Nov 12,

2003 John White Student BerkeleyMichael Diamond Eating Pasta none

1234 243Sep 25,

2003 Mary Jones Auditor202 South Hall Charles Field Reading Textbook none

1234 467Oct 10,

2003 Mary Jones Auditor Oakland Patricia Gold Drinking Teacup Chinese

2345 189Sep 27,

2003Charles Brown Student Sather Gate David Rosen Singing none none

4876 145 Nov 5, 2003 Hal Kane Student Northside Beth Little Shopping Book bag Blue

5123 145Oct 10,

2003 Paul Kosher Student South Hall Beth Little Reading none none

6845 243 Oct 5, 2003 Ann Hood Student Oakland Charles Field Eating BurritoVegetarian

6845 243Dec 15,

2003 Ann Hood Student Oakland Charles Field Shopping none none

Page 44: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 44IS 202 – FALL 2005

1NF Storage Anomalies

• Insertion: A new person has not yet taken a picture -- hence no Picture # -- Since Picture # is part of the key we can’t insert

• Insertion: If a Person is are known and likely to be photographed, but haven’t been yet -- there is be no way to include that person in the database

• Update: If a Person changes status (e.g. Mary Jones becomes a Student) we have to change multiple rows in the database

• Deletion (type 1): Deleting a Person record may also delete all info about People in the pictures

• Deletion (type 2): When there are functional dependencies (like Object and Object_features) changing one item eliminates other information

Page 45: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 45IS 202 – FALL 2005

Second Normal Form

• A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key– That is, every nonkey attribute needs the full

primary key for unique identification

Page 46: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 46IS 202 – FALL 2005

Second Normal Form

Person # Person Name Person Type

1111 John White Student

1234 Mary Jones Auditor

2345Charles Brown Student

4876 Hal Kane Student

5123 Paul Kosher Student

6845 Ann Hood Student

Person Table

Page 47: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 47IS 202 – FALL 2005

Second Normal Form

People # People145 Beth Little189 David Rosen243 Charles Field311 Michael Diamond467 Patricia Gold

People Table

Page 48: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 48IS 202 – FALL 2005

Second Normal FormPerson # People # Picture Date Location Activity Objects Object_Feat

1111 145 01-Oct-03San

Francisco Shopping Book bag Blue

1111 311 12-Nov-03 Berkeley Eating Pasta none

1234 243 25-Sep-03202 South

Hall Reading Textbook none

1234 467 10-Oct-03 Oakland Drinking Teacup Chinese

2345 189 27-Sep-03 Sather Gate Singing none none

4876 145 05-Nov-03 Northside Shopping Book bag Blue

5123 145 10-Oct-03 South Hall Reading none none

6845 243 05-Oct-03 Oakland Eating Burrito vegetarian

6845 243 15-Dec-03 Oakland Shopping none none

Picture Table

Page 49: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 49IS 202 – FALL 2005

1NF Storage Anomalies Removed

• Insertion: Can now enter new Persons who haven’t yet taken pictures

• Insertion: Can now enter People who haven’t been photographed

• Deletion (type 1): If Charles Brown withdraws his photos the corresponding tuples from Person and Picture tables can be deleted without losing information on David Rosen

• Update: If John White takes a third picture, and has changed status (e.g., graduate), we only need to change the Person table in one place

Page 50: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 50IS 202 – FALL 2005

2NF Storage Anomalies

• Insertion: Cannot enter the fact that a particular object has a particular feature unless it is associated with a particular picture

• Deletion: If John White describes some other object that Beth Little has while shopping, we lose the fact that the bookbag is blue

• Update: If the features of an object change change we have to update multiple occurrences of object features

Page 51: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 51IS 202 – FALL 2005

Third Normal Form

• A relation is said to be in Third Normal Form if there are no transitive functional dependencies between nonkey attributes– When one nonkey attribute can be

determined with one or more nonkey attributes there is said to be a transitive functional dependency

• The Obect_Feature column in the Picture table is determined by the Object– Object_Feature is transitively functionally

dependent on Object so Picture is not 3NF

Page 52: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 52IS 202 – FALL 2005

Third Normal Form

Person # People # Picture Date Location Activity Objects

1111 145 01-Oct-03 San Francisco Shopping Book bag

1111 311 12-Nov-03 Berkeley Eating Pasta

1234 243 25-Sep-03 202 South Hall Reading Textbook

1234 467 10-Oct-03 Oakland Drinking Teacup

2345 189 27-Sep-03 Sather Gate Singing none

4876 145 05-Nov-03 Northside Shopping Book bag

5123 145 10-Oct-03 South Hall Reading none

6845 243 05-Oct-03 Oakland Eating Burrito

6845 243 15-Dec-03 Oakland Shopping none

Picture Table

Page 53: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 53IS 202 – FALL 2005

Third Normal Form

Objects Object_Feat

Book bag Blue

Pasta none

Textbook none

Teacup Chinese

Burrito Vegetarian

Object Table

Page 54: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 54IS 202 – FALL 2005

2NF Storage Anomalies Removed

• Insertion: We can now enter the fact that an object has a particular feature

• Deletion: If John White describes some other object that Beth Little has while shopping, we don’t lose the fact that the bookbag is blue

• Update: The features for each object appear only once

Page 55: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 55IS 202 – FALL 2005

Boyce-Codd Normal Form

• Most 3NF relations are also BCNF relations

• A 3NF relation is NOT in BCNF if:– Candidate keys in the relation are composite

keys (they are not single attributes)– There is more than one candidate key in the

relation, and– The keys are not disjoint, that is, some

attributes in the keys are common

Page 56: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 56IS 202 – FALL 2005

Most 3NF Relations Are Also BCNF – Is This One?

Person # Person Name Person Type

1111 John White Student

1234 Mary Jones Auditor

2345Charles Brown Student

4876 Hal Kane Student

5123 Paul Kosher Student

6845 Ann Hood Student

Page 57: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 57IS 202 – FALL 2005

BCNF Relations

Person # Person Name

1111 John White

1234 Mary Jones

2345Charles Brown

4876 Hal Kane

5123 Paul Kosher

6845 Ann Hood

Person # Person Type

1111 Student

1234 Auditor

2345 Student

4876 Student

5123 Student

6845 Student

Page 58: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 58IS 202 – FALL 2005

Additional Issues

• Why separate Person and People?– They are really all People/Persons in different

roles

• Shouldn’t a picture have a unique ID regardless of Who is in it?

• Can’t we have multiple people in the same picture, multiple objects, etc.?

• Can’t objects have multiple characteristics?

Page 59: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 59IS 202 – FALL 2005

BCNF Relations

Picture # Person # Picture Date

1 1111 01-Oct-03

2 1111 12-Nov-03

3 1234 25-Sep-03

4 1234 10-Oct-03

5 2345 27-Sep-03

6 4876 05-Nov-03

7 5123 10-Oct-03

8 6845 05-Oct-03

9 6845 15-Dec-03

loc # Location

1 San Francisco

2 Berkeley

3 202 South Hall

4 Oakland

5 Sather Gate

6 Northside

7 South Hall

Picture # loc #

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 4

9 4Act # Activity

1 Shopping

2 Eating

3 Reading

4 Drinking

5 Singing

Picture # Act #

1 1

2 2

3 3

4 4

5 5

6 1

7 3

8 2

9 1

Picture # Obj #

1 1

2 2

3 3

4 4

6 1

8 5

Obj # Objects

1 Book bag

2 Pasta

3 Textbook

4 Teacup

5 BurritoPicture # People #

1 145

2 311

3 243

4 467

5 189

6 145

7 145

8 243

9 243

Page 60: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 60IS 202 – FALL 2005

BCNF Added Capabilities

• Can now have a picture with no (identified) people in it

• Can have multiple objects, activities, and people associated with each picture

Page 61: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 61IS 202 – FALL 2005

Fourth Normal Form

• Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial

• Eliminate non-trivial multivalued dependencies by projecting into simpler tables

Page 62: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 62IS 202 – FALL 2005

Fifth Normal Form

• A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation

• Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation

Page 63: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 63IS 202 – FALL 2005

Fifth Normal Form Relations

Picture # Person # Picture Date

1 1111 01-Oct-03

2 1111 12-Nov-03

3 1234 25-Sep-03

4 1234 10-Oct-03

5 2345 27-Sep-03

6 4876 05-Nov-03

7 5123 10-Oct-03

8 6845 05-Oct-03

9 6845 15-Dec-03

loc # Location

1 San Francisco

2 Berkeley

3 202 South Hall

4 Oakland

5 Sather Gate

6 Northside

7 South Hall

Picture # loc #

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 4

9 4

Act # Activity

1 Shopping

2 Eating

3 Reading

4 Drinking

5 Singing

Picture # Act #

1 1

2 2

3 3

4 4

5 5

6 1

7 3

8 2

9 1

Picture # Obj #

1 1

2 2

3 3

4 4

6 1

8 5

Obj # Objects

1 Book bag

2 Pasta

3 Textbook

4 Teacup

5 Burrito

Picture # People #

1 145

2 311

3 243

4 467

5 189

6 145

7 145

8 243

9 243

People Table

Page 64: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 64IS 202 – FALL 2005

Normalizing to Death

• Normalization splits database information across multiple tables

• To retrieve complete information from a normalized database, the JOIN operation must be used

• JOIN tends to be expensive in terms of processing time, and very large joins are very expensive

Page 65: 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information Organization and Retrieval Normalization & The Relational.

2005.10.11 - SLIDE 65IS 202 – FALL 2005

Lecture Overview

• Review– Databases and Database Design

– Database Life Cycle

– ER Diagrams

– Database Design

• Relational Operations

• Normalization

• Discussion