2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of 2005.10.11 - SLIDE 1IS 202 – FALL 2005 Prof. Ray Larson UC Berkeley SIMS SIMS 202: Information...
2005.10.11 - SLIDE 1IS 202 – FALL 2005
Prof. Ray Larson
UC Berkeley SIMS
SIMS 202:
Information Organization
and Retrieval
Normalization & The Relational Model
2005.10.11 - SLIDE 2IS 202 – FALL 2005
Lecture Overview
• Review– Databases and Database Design
– Database Life Cycle
– ER Diagrams
– Database Design
• Relational Operations
• Normalization
• Discussion Questions
2005.10.11 - SLIDE 3IS 202 – FALL 2005
Lecture Overview
• Review– Databases and Database Design
– Database Life Cycle
– ER Diagrams
– Database Design
• Relational Operations
• Normalization
• Discussion Questions
2005.10.11 - SLIDE 4IS 202 – FALL 2005
Models (1)
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005.10.11 - SLIDE 5IS 202 – FALL 2005
Database System Life Cycle
Growth,Change, &
Maintenance6
Operations5
Integration4
Design1
Conversion3
PhysicalCreation
2
2005.10.11 - SLIDE 6IS 202 – FALL 2005
Another View of the Life Cycle
Operations5
Conversion3
PhysicalCreation
2Growth, Change
6
Integration4
Design1
2005.10.11 - SLIDE 7IS 202 – FALL 2005
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005.10.11 - SLIDE 8IS 202 – FALL 2005
Entity
• An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information– Persons (e.g.: customers in a business,
employees, authors)– Things (e.g.: purchase orders, meetings,
parts, companies)
Employee
2005.10.11 - SLIDE 9IS 202 – FALL 2005
Attributes
• Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (This is the Metadata for the entities)
Employee
Last
Middle
First
Name SSN
Age
Birthdate
Projects
2005.10.11 - SLIDE 10IS 202 – FALL 2005
Relationships
• Relationships are the associations between entities
• They can involve one or more entities and belong to particular relationship types– One to One– One to Many– Many to Many
2005.10.11 - SLIDE 11IS 202 – FALL 2005
Relationships
ClassAttendsStudent
PartSuppliesproject parts
Supplier
Project
2005.10.11 - SLIDE 12IS 202 – FALL 2005
Types of Relationships
• Concerned only with cardinality of relationship
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
1 1
n
n
1
m
Chen ER notation
2005.10.11 - SLIDE 13IS 202 – FALL 2005
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005.10.11 - SLIDE 14IS 202 – FALL 2005
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005.10.11 - SLIDE 15IS 202 – FALL 2005
Requirements Analysis
• Conceptual Requirements– Systems Analysis Process
• Examine all of the information sources used in existing applications
• Identify the characteristics of each data element– Numeric– Text– Date/time– Etc.
• Examine the tasks carried out using the information
• Examine results or reports created using the information
2005.10.11 - SLIDE 16IS 202 – FALL 2005
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005.10.11 - SLIDE 17IS 202 – FALL 2005
Conceptual Design
• Conceptual Model– Merge the collective needs of all applications– Determine what Entities are being used
• Some object about which information is to maintained
– What are the Attributes of those entities?• Properties or characteristics of the entity• What attributes uniquely identify the entity
– What are the Relationships between entities• How the entities interact with each other?
2005.10.11 - SLIDE 18IS 202 – FALL 2005
Developing a Conceptual Model
• Overall view of the database that integrates all the needed information discovered during the requirements analysis
• Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details
• Can also be represented using other modeling tools (such as UML)
2005.10.11 - SLIDE 19IS 202 – FALL 2005
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005.10.11 - SLIDE 20IS 202 – FALL 2005
Logical Design
• Logical Model– How is each entity and relationship
represented in the Data Model of the DBMS• Hierarchic?• Network?• Relational?• Object-Oriented?
2005.10.11 - SLIDE 21IS 202 – FALL 2005
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005.10.11 - SLIDE 22IS 202 – FALL 2005
Physical Design
• Internal Model– Choices of index file structure– Choices of data storage formats– Choices of disk layout
2005.10.11 - SLIDE 23IS 202 – FALL 2005
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005.10.11 - SLIDE 24IS 202 – FALL 2005
Database Application Design
• External Model– User views of the integrated database – Making the old (or updated) applications work
with the new database design
2005.10.11 - SLIDE 25IS 202 – FALL 2005
Lecture Overview
• Review– Databases and Database Design
– Database Life Cycle
– ER Diagrams
– Database Design
• Relational Operations
• Normalization
• Discussion Questions
2005.10.11 - SLIDE 26IS 202 – FALL 2005
Relational Algebra Operations
• Restrict
• Project
• Product
• Union
• Intersect
• Difference
• Join
• Divide
2005.10.11 - SLIDE 27IS 202 – FALL 2005
Restrict
• Extracts specified tuples (rows) from a specified relation (table) – Restrict is AKA “Select”
2005.10.11 - SLIDE 28IS 202 – FALL 2005
Project
• Extracts specified attributes(columns) from a specified relation.
2005.10.11 - SLIDE 29IS 202 – FALL 2005
Join
• Builds a relation from two specified relations consisting of all possible concatenated pairs, one from each of the two relations, such that in each pair the two tuples satisfy some condition. (E.g., equal values in a given col.)
A1 B1A2 B1A3 B2
B1 C1B2 C2B3 C3
A1 B1 C1A2 B1 C1A3 B2 C2
(Naturalor Inner)
Join
2005.10.11 - SLIDE 30IS 202 – FALL 2005
ER Diagram: Acme Widget Co.
Contains Part
Part# Count
Price
Customer
Quantity
Orders
Cust#
Invoice
Writes
Sales-Rep
Invoice#
Sales
Rep#
Line-ItemContains
Part#
Invoice#
Cust#
Hourly
Employee
ISA
Emp#Wage
2005.10.11 - SLIDE 31IS 202 – FALL 2005
Join Items for Relational DBPart # Name Price Count
1 Big blue widget 3.76 22 Small blue Widget 7.35 43 Tiny red widget 5.25 74 large red widget 157.23 235 double widget rack 10.44 126 Small green Widget 30.45 587 Big yellow widget 7.96 18 Tiny orange widget 81.75 429 Big purple widget 55.99 9
Invoice# Part# Quantity93774 3 1084747 23 188647 75 288367 4 3
776879 22 565689 76 1293774 23 1088367 34 2
Invoice # Cust # Rep #93774 3 184747 4 188367 5 288647 9 1
776879 2 265689 6 2
Cust # COMPANY STREET1 STREET2 CITY STATE ZIPCODE
1Integrated Standards Ltd. 35 Broadway Floor 12 New York NY 02111
2 MegaInt Inc. 34 Bureaucracy Plaza Floors 1-172 Phildelphia PA 03756
3 Cyber Associates3 Control Elevation Place
Cyber Assicates Center Cyberoid NY 08645
4General Consolidated 35 Libra Plaza Nashua NH 09242
5Consolidated MultiCorp 1 Broadway Middletown IN 32467
6Internet Behometh Ltd. 88 Oligopoly Place Sagrado TX 78798
7Consolidated Brands, Inc.
3 Independence Parkway Rivendell CA 93456
8 Little Mighty Micro 34 Last One Drive Orinda CA 94563
9 SportLine Ltd. 38 Champion Place Suite 882 Compton CA 95328
Line_item Parts
Customer
Invoice
2005.10.11 - SLIDE 32IS 202 – FALL 2005
Relational Operations
• What is the name of the customer who ordered Large Red Widgets?– Restrict “large red widget” row from Part as
temp1– Join temp1 with Line-item on Part # as temp2– Join temp2 with Invoice on Invoice # as temp3– Join temp3 with Customer on cust # as temp4– Project Company from temp4 as answer
2005.10.11 - SLIDE 33IS 202 – FALL 2005
SQL
• Database Definition and Querying– Can be used as an interactive query language– Can be imbedded in programs
• Relational Calculus combines Restrict, Project and Join operations in a single command: SELECT
2005.10.11 - SLIDE 34IS 202 – FALL 2005
SELECT
• Syntax:
SELECT [DISTINCT] attr1, attr2,…, attr3 FROM rel1 r1, rel2 r2,… rel3 r3 WHERE condition1 {AND | OR} condition2 ORDER BY attr1 [DESC], attr3 [DESC]
2005.10.11 - SLIDE 35IS 202 – FALL 2005
SQL SELECT
SELECT c.COMPANY FROM Customer c, Parts p, Invoice i, Line_Items z
WHERE c.Cust# = i.Cust#
AND i.Invoice# = z.Invoice#
AND z.Part# = p.Part#
AND p.Name = “large red widget”;
2005.10.11 - SLIDE 36IS 202 – FALL 2005
Lecture Overview
• Review– Databases and Database Design
– Database Life Cycle
– ER Diagrams
– Database Design
• Relational Operations
• Normalization
• Discussion Questions
2005.10.11 - SLIDE 37IS 202 – FALL 2005
Normalization
• Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data
• Normalization is a multi-step process beginning with an “unnormalized” relation
2005.10.11 - SLIDE 38IS 202 – FALL 2005
Normal Forms
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)
2005.10.11 - SLIDE 39IS 202 – FALL 2005
Normalization
Boyce-Codd and
Higher
Functional dependencyof nonkey attributes on the primary key - Atomic values only
Full Functional dependencyof nonkey attributes on the primary key
No transitive dependency between nonkey attributes
All determinants are candidate keys - Single multivalued dependency
2005.10.11 - SLIDE 40IS 202 – FALL 2005
Unnormalized Relations
• First step in normalization is to convert the data into a two-dimensional table
• In unnormalized relations data can repeat within a column
• (The following is a highly contrived example that has only a very vague resemblance to the implementation of the Phone/Photo project database from IS202 in 2004 …)
2005.10.11 - SLIDE 41IS 202 – FALL 2005
Unnormalized RelationsPerson # People # Picture date Person Name Person Type Location People Activity Objects Object_Feat
1111145 311
Oct 1, 2003; Nov 12, 2003 John White Student
San Francisco, Berkeley
Beth Little Michael Diamond
Shopping; Eating
Book bag; Pasta
Blue none
1234243 467
Sep 25, 2003; Oct 10, 2003 Mary Jones Auditor
202 South Hall; Oakland
Charles Field Patricia Gold
Reading; Drinking
Textbook; Teacup
None; Chinese
2345 189Sep 27, 2003 Charles Brown Student
Sather Gate
David Rosen Singing none none
4876 145Nov 5, 2003 Hal Kane Student Northside Beth Little Shopping Book bag Blue
5123 145Oct 10, 2003 Paul Kosher Student South Hall Beth Little Reading none none
6845 243
Oct 5, 2003 Dec 15, 2003 Ann Hood Student
Oakland; Oakland
Charles Field; Charles Field
Eating; Shopping
Burrito; none
vegetarian; none
2005.10.11 - SLIDE 42IS 202 – FALL 2005
First Normal Form
• To move to First Normal Form a relation must contain only atomic values at each row and column– No repeating groups– A column or set of columns is called a
Candidate Key when its values can uniquely identify the row in the relation
2005.10.11 - SLIDE 43IS 202 – FALL 2005
First Normal Form
Person # People # Picture DatePerson Name Person Type Location People Activity Objects Object_feat
1111 145 Oct 1, 2003 John White StudentSan Francisco Beth Little Shopping Book bag Blue
1111 311Nov 12,
2003 John White Student BerkeleyMichael Diamond Eating Pasta none
1234 243Sep 25,
2003 Mary Jones Auditor202 South Hall Charles Field Reading Textbook none
1234 467Oct 10,
2003 Mary Jones Auditor Oakland Patricia Gold Drinking Teacup Chinese
2345 189Sep 27,
2003Charles Brown Student Sather Gate David Rosen Singing none none
4876 145 Nov 5, 2003 Hal Kane Student Northside Beth Little Shopping Book bag Blue
5123 145Oct 10,
2003 Paul Kosher Student South Hall Beth Little Reading none none
6845 243 Oct 5, 2003 Ann Hood Student Oakland Charles Field Eating BurritoVegetarian
6845 243Dec 15,
2003 Ann Hood Student Oakland Charles Field Shopping none none
2005.10.11 - SLIDE 44IS 202 – FALL 2005
1NF Storage Anomalies
• Insertion: A new person has not yet taken a picture -- hence no Picture # -- Since Picture # is part of the key we can’t insert
• Insertion: If a Person is are known and likely to be photographed, but haven’t been yet -- there is be no way to include that person in the database
• Update: If a Person changes status (e.g. Mary Jones becomes a Student) we have to change multiple rows in the database
• Deletion (type 1): Deleting a Person record may also delete all info about People in the pictures
• Deletion (type 2): When there are functional dependencies (like Object and Object_features) changing one item eliminates other information
2005.10.11 - SLIDE 45IS 202 – FALL 2005
Second Normal Form
• A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key– That is, every nonkey attribute needs the full
primary key for unique identification
2005.10.11 - SLIDE 46IS 202 – FALL 2005
Second Normal Form
Person # Person Name Person Type
1111 John White Student
1234 Mary Jones Auditor
2345Charles Brown Student
4876 Hal Kane Student
5123 Paul Kosher Student
6845 Ann Hood Student
Person Table
2005.10.11 - SLIDE 47IS 202 – FALL 2005
Second Normal Form
People # People145 Beth Little189 David Rosen243 Charles Field311 Michael Diamond467 Patricia Gold
People Table
2005.10.11 - SLIDE 48IS 202 – FALL 2005
Second Normal FormPerson # People # Picture Date Location Activity Objects Object_Feat
1111 145 01-Oct-03San
Francisco Shopping Book bag Blue
1111 311 12-Nov-03 Berkeley Eating Pasta none
1234 243 25-Sep-03202 South
Hall Reading Textbook none
1234 467 10-Oct-03 Oakland Drinking Teacup Chinese
2345 189 27-Sep-03 Sather Gate Singing none none
4876 145 05-Nov-03 Northside Shopping Book bag Blue
5123 145 10-Oct-03 South Hall Reading none none
6845 243 05-Oct-03 Oakland Eating Burrito vegetarian
6845 243 15-Dec-03 Oakland Shopping none none
Picture Table
2005.10.11 - SLIDE 49IS 202 – FALL 2005
1NF Storage Anomalies Removed
• Insertion: Can now enter new Persons who haven’t yet taken pictures
• Insertion: Can now enter People who haven’t been photographed
• Deletion (type 1): If Charles Brown withdraws his photos the corresponding tuples from Person and Picture tables can be deleted without losing information on David Rosen
• Update: If John White takes a third picture, and has changed status (e.g., graduate), we only need to change the Person table in one place
2005.10.11 - SLIDE 50IS 202 – FALL 2005
2NF Storage Anomalies
• Insertion: Cannot enter the fact that a particular object has a particular feature unless it is associated with a particular picture
• Deletion: If John White describes some other object that Beth Little has while shopping, we lose the fact that the bookbag is blue
• Update: If the features of an object change change we have to update multiple occurrences of object features
2005.10.11 - SLIDE 51IS 202 – FALL 2005
Third Normal Form
• A relation is said to be in Third Normal Form if there are no transitive functional dependencies between nonkey attributes– When one nonkey attribute can be
determined with one or more nonkey attributes there is said to be a transitive functional dependency
• The Obect_Feature column in the Picture table is determined by the Object– Object_Feature is transitively functionally
dependent on Object so Picture is not 3NF
2005.10.11 - SLIDE 52IS 202 – FALL 2005
Third Normal Form
Person # People # Picture Date Location Activity Objects
1111 145 01-Oct-03 San Francisco Shopping Book bag
1111 311 12-Nov-03 Berkeley Eating Pasta
1234 243 25-Sep-03 202 South Hall Reading Textbook
1234 467 10-Oct-03 Oakland Drinking Teacup
2345 189 27-Sep-03 Sather Gate Singing none
4876 145 05-Nov-03 Northside Shopping Book bag
5123 145 10-Oct-03 South Hall Reading none
6845 243 05-Oct-03 Oakland Eating Burrito
6845 243 15-Dec-03 Oakland Shopping none
Picture Table
2005.10.11 - SLIDE 53IS 202 – FALL 2005
Third Normal Form
Objects Object_Feat
Book bag Blue
Pasta none
Textbook none
Teacup Chinese
Burrito Vegetarian
Object Table
2005.10.11 - SLIDE 54IS 202 – FALL 2005
2NF Storage Anomalies Removed
• Insertion: We can now enter the fact that an object has a particular feature
• Deletion: If John White describes some other object that Beth Little has while shopping, we don’t lose the fact that the bookbag is blue
• Update: The features for each object appear only once
2005.10.11 - SLIDE 55IS 202 – FALL 2005
Boyce-Codd Normal Form
• Most 3NF relations are also BCNF relations
• A 3NF relation is NOT in BCNF if:– Candidate keys in the relation are composite
keys (they are not single attributes)– There is more than one candidate key in the
relation, and– The keys are not disjoint, that is, some
attributes in the keys are common
2005.10.11 - SLIDE 56IS 202 – FALL 2005
Most 3NF Relations Are Also BCNF – Is This One?
Person # Person Name Person Type
1111 John White Student
1234 Mary Jones Auditor
2345Charles Brown Student
4876 Hal Kane Student
5123 Paul Kosher Student
6845 Ann Hood Student
2005.10.11 - SLIDE 57IS 202 – FALL 2005
BCNF Relations
Person # Person Name
1111 John White
1234 Mary Jones
2345Charles Brown
4876 Hal Kane
5123 Paul Kosher
6845 Ann Hood
Person # Person Type
1111 Student
1234 Auditor
2345 Student
4876 Student
5123 Student
6845 Student
2005.10.11 - SLIDE 58IS 202 – FALL 2005
Additional Issues
• Why separate Person and People?– They are really all People/Persons in different
roles
• Shouldn’t a picture have a unique ID regardless of Who is in it?
• Can’t we have multiple people in the same picture, multiple objects, etc.?
• Can’t objects have multiple characteristics?
2005.10.11 - SLIDE 59IS 202 – FALL 2005
BCNF Relations
Picture # Person # Picture Date
1 1111 01-Oct-03
2 1111 12-Nov-03
3 1234 25-Sep-03
4 1234 10-Oct-03
5 2345 27-Sep-03
6 4876 05-Nov-03
7 5123 10-Oct-03
8 6845 05-Oct-03
9 6845 15-Dec-03
loc # Location
1 San Francisco
2 Berkeley
3 202 South Hall
4 Oakland
5 Sather Gate
6 Northside
7 South Hall
Picture # loc #
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 4
9 4Act # Activity
1 Shopping
2 Eating
3 Reading
4 Drinking
5 Singing
Picture # Act #
1 1
2 2
3 3
4 4
5 5
6 1
7 3
8 2
9 1
Picture # Obj #
1 1
2 2
3 3
4 4
6 1
8 5
Obj # Objects
1 Book bag
2 Pasta
3 Textbook
4 Teacup
5 BurritoPicture # People #
1 145
2 311
3 243
4 467
5 189
6 145
7 145
8 243
9 243
2005.10.11 - SLIDE 60IS 202 – FALL 2005
BCNF Added Capabilities
• Can now have a picture with no (identified) people in it
• Can have multiple objects, activities, and people associated with each picture
2005.10.11 - SLIDE 61IS 202 – FALL 2005
Fourth Normal Form
• Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial
• Eliminate non-trivial multivalued dependencies by projecting into simpler tables
2005.10.11 - SLIDE 62IS 202 – FALL 2005
Fifth Normal Form
• A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation
• Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation
2005.10.11 - SLIDE 63IS 202 – FALL 2005
Fifth Normal Form Relations
Picture # Person # Picture Date
1 1111 01-Oct-03
2 1111 12-Nov-03
3 1234 25-Sep-03
4 1234 10-Oct-03
5 2345 27-Sep-03
6 4876 05-Nov-03
7 5123 10-Oct-03
8 6845 05-Oct-03
9 6845 15-Dec-03
loc # Location
1 San Francisco
2 Berkeley
3 202 South Hall
4 Oakland
5 Sather Gate
6 Northside
7 South Hall
Picture # loc #
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 4
9 4
Act # Activity
1 Shopping
2 Eating
3 Reading
4 Drinking
5 Singing
Picture # Act #
1 1
2 2
3 3
4 4
5 5
6 1
7 3
8 2
9 1
Picture # Obj #
1 1
2 2
3 3
4 4
6 1
8 5
Obj # Objects
1 Book bag
2 Pasta
3 Textbook
4 Teacup
5 Burrito
Picture # People #
1 145
2 311
3 243
4 467
5 189
6 145
7 145
8 243
9 243
People Table
2005.10.11 - SLIDE 64IS 202 – FALL 2005
Normalizing to Death
• Normalization splits database information across multiple tables
• To retrieve complete information from a normalized database, the JOIN operation must be used
• JOIN tends to be expensive in terms of processing time, and very large joins are very expensive
2005.10.11 - SLIDE 65IS 202 – FALL 2005
Lecture Overview
• Review– Databases and Database Design
– Database Life Cycle
– ER Diagrams
– Database Design
• Relational Operations
• Normalization
• Discussion