October 17, 2017 Sam Siewert
CS317File and Database Systems
Lecture 8 – Normalization, Bottom-Upfrom UNF to BCNF
http://www.google.com/about/datacenters/gallery/index.html#/locations/the-dalles/1
http://nsa.gov1.info/utah-data-center/
Inside a Datacenter – E.g. Green House1 Rack is 42 Rack Units [U = 1.75”], 6.125 Feet High, StackedHDD Storage 2U to 4U, 3.5” HDDSSD Storage 1U to 2U, 2.5” SSDComputing 1U to 4U ServersStandard Rack Depth = 36”, Width = 19” or 23”Hot Rows [Fan Exhaust], Cold Rows [Front Panels]Power, Chillers for Air Handling, Optional Liquid Cooling– E.g. Emerson Liebert– 120/240VAC Power Conditioning
and Distribution – E.g. Eaton and Pulizzi PDU
– DC Telco Rack Alternatives [Higher Efficiency, Less Convenient]
Sam Siewert 2
Sam Siewert – Typical Datacenter Rack[E.g. DreamWorks, Microsoft, Green House,NCAR, NOAA, DoE, Xerox, Amazon, …]
Basic Datacenter Figures of Merit [2014]Power Density – 10 KW Per Rack– 120/240VAC, 20Amp Circuits, 1-Phase Loads– E.g. 10KW Per Rack, 2 x 20Amp 240VAC PDU– Dual Circuit for Dual Power Supply Computing and Storage– Hot Swap Power Units in Servers and Storage Enclosures
Storage Density – TB/U, 1PB Rack is 24 to 48 TB/U– 60+ 3.5” 3TB HDDs, 180TB in 4U, 45TB/U– RAID10 is Striped and Mirrored Storage (Typical for DBMS)
Compute Density – 2 or 4 Socket 1U/2U Servers– E.g. 8 Cores/CPU, 4 CPU Sockets, HT3 or QPI, 32 per U
Network Port Density and Bandwidth– GigE, 10GE, 40G Infiniband or Bonded 10GE, 100G CEE– Ports Per Server, Copper [Twinax, TP, IB] or Multi-mode Optical
[LC/SC SFP/SFP+ connectors], Typically 2 to 4 or More– SFP Transceivers – Copper or Optical– LC/SC Connectors for Optical– Switch Port Density
Sam Siewert 3
E.g. LC SFP
http://en.wikipedia.org/wiki/Small_form-factor_pluggable_transceiver
Significance of NormalizationStructured (Normalized), Indexed, Searchable, High Veracity Data
– E.g. 2TB HDD Could Hold 6.8 Kilobytes of Data on Every Person in US– 1 Petabyte [RAID 10 42U Rack] Stores 3 Megabytes for Every US Citizen– E.g. All Documented Life Events [Legal, Travel, Residency]– 1000 Racks, 1 Exabyte Structured + Unstructured Data [E-mail and Audio]– NSA Utah Estimated to Store up to 3 to 12 Exabytes by Forbes– 180 Petabytes for 24 hour Audio on 300 million People– 300 Petabytes for 1 Year of Phone Conversations on all US Citizens [Forbes]
Unstructured BLOB Files [Documents, Images, Audio, Video]– Audio with Compression [10 to 200 hours of MP3 per 640MB CD]– Easily 24 hours on an MP3 CD of Intelligible Conversation– Images - JPEG Lossy [10:1 to 20:1], PNG Lossless [4:1 to 10:1] – MPEG Compression is 20:1 to 100:1 [Lossy] for I-frame and MVQ B/P-frames– 24 Hours of SD Video is about 50 Gigabytes of Data [10 SD DVDs]– 14,305 Racks of RAID-10 Disk for a Day of 30Hz SD Video of All US Citizens
Structured Data – Financial & Legal Transactions, Records
Unstructured Documents, E-mail, Audio, Snapshots, [Some Video]– Not Only Capture and Store, but Search!
Sam Siewert 4
RemindersEx #4 Posted
Assignment #3 Returned Next Week
Assignment #5, Physical DB Design and Project!
Assignment #6, Complete DBMS Project – FINAL– Design Schema for DBMS project in a small team
Logical design focusNormalizationPhysical is MySQL on PRClab
– Combine Network Applications with DBMS in C/C++, JDBC, or Python - http://www.mysql.com/products/connector/
– Add Stored Programs and Triggers– Add Views– Create Transactions where needed
Sam Siewert 5
NormalizationConcern is Duplication of Data in DBMS and Hazards– Wastes Space – Duplicate Data– Insert Hazard - New Staff Row Also Assigned to B007, Second
Insert of bAddress, must match that Already Existing for SA9– Delete Hazard – SA9 Quits, Row Delete, Lose B007 bAddress– Modification Hazard – bAddress Change for B005 or B003– Foreign Keys are Exception (Expected Redundancy for
Relational Model)
Sam Siewert 6
RedundantAttribute Data
14.3 – UNF [1NF]
7
The Process of Normalization[Follow Rules for Relational Table Design and Hints coming from ER/EER Information Model]
UNF – Paper, Spreadsheet
8
The Process of Normalization[Bottom-Up Tables]
UNF -> 3NFMinimizes Update Anomalies [Insert, Update, Delete], Page 420 to 426One Client Renting Multiple Properties – Typical of Spreadsheet, Paper
Sam Siewert 9
cNo cName pNo pAddr start finish Rent oNo oName
CR76 John Kay PG4, PG16
6 Lawrence Street,5 Novar Drive
7/1/12,9/1/13
8/31/13,9/1/14
350,50
CO40,CO93
Tina Murphy,Tony Shaw
CR56 Aline Stewart PG4,PG36,PG16
6 Lawrence Street,2 Manor Road,5 Novar Drive
9/1/11,10/1/12,11/1/14
6/10/12,12/1/13,8/10/15
350375450
CO40CO93CO93
Tina Murphy,Tony Shaw,Tony Shaw
Client
PropertyOwner
PropertyForRent OwnerRental
14.10 – UNF
2NFRentalClient
3NF
UNF -> 1NFUNF – Table with ONE or MORE Repeating Groups [Tuple Sub-set]1NF – Relation where Intersection of Each Row and Column has ONE Value
Sam Siewert 10
cNo cName pNo pAddr start finish Rent oNo oName
CR76 John Kay PG4, PG16
6 Lawrence Street,5 Novar Drive
7/1/12,9/1/13
8/31/13,9/1/14
350,50
CO40,CO93
Tina Murphy,Tony Shaw
CR56 Aline Stewart PG4,PG36,PG16
6 Lawrence Street,2 Manor Road,5 Novar Drive
9/1/11,10/1/12,11/1/14
6/10/12,12/1/13,8/10/15
350375450
CO40CO93CO93
Tina Murphy,Tony Shaw,Tony Shaw
14.10 – UNF
cNo cName pNo pAddr start finish Rent oNo oName
CR76 John Kay PG4 6 Lawrence Street 7/1/12 8/31/13 350 CO40 Tina Murphy
CR76 John Kay PG16 5 Novar Drive 9/1/13 9/1/14 50 CO93 Tony Shaw
CR56 Aline Stewart PG4 6 Lawrence Street 9/1/11 6/10/12 350 CO40 Tina Murphy
CR56 Aline Stewart PG36 2 Manor Road 10/1/12 12/1/13 375 CO93 Tony Shaw
CR56 Aline Stewart PG16 5 Novar Drive 11/1/14 8/10/15 450 CO93 Tony Shaw
14.11 – 1NF (Still Suffers all 3 Hazards)
1NF -> 2NF1NF – Relation where Intersection of Row and Column Has ONE Value2NF – 1NF Relation where Every Non-Primary Key Attribute is Fully Functionally Dependent on the PK [ER 1..1 to 1..1 Relations]
Sam Siewert 11
cNo cName pNo pAddr start finish Rent oNo oName
CR76 John Kay PG4 6 Lawrence Street 7/1/12 8/31/13 350 CO40 Tina Murphy
CR76 John Kay PG16 5 Novar Drive 9/1/13 9/1/14 50 CO93 Tony Shaw
CR56 Aline Stewart PG4 6 Lawrence Street 9/1/11 6/10/12 350 CO40 Tina Murphy
CR56 Aline Stewart PG36 2 Manor Road 10/1/12 12/1/13 375 CO93 Tony Shaw
CR56 Aline Stewart PG16 5 Novar Drive 11/1/14 8/10/15 450 CO93 Tony Shaw
14.11 – 1NF (Suffers all 3 Hazards)
cNo [PK] cName
CR76 John Kay
CR56 Aline Stewart
14.14 – 2NF (Still Suffers Update Hazard Due to Transitive Dependency pNo -> oNo -> oName)
cNo pNo start finish
CR76 PG4 7/1/12 8/31/13
CR76 PG16 9/1/13 9/1/14
CR56 PG4 9/1/11 6/10/12
CR56 PG36 10/1/12 12/1/13
CR56 PG16 11/1/14 8/10/15
pNo pAddress rent oNo oName
PG4 6 Lawrence Street 350 CO40 Tina Murphy
PG16 5 Novar Drive 450 CO93 Tony Shaw
PG36 2 Manor Road 375 CO93 Tony Shaw
Figure 14.13 Alternate 1NF
Sam Siewert 12
cNo pNo pAddr start finish Rent oNo oName
CR76 PG4 6 Lawrence Street 7/1/12 8/31/13 350 CO40 Tina Murphy
CR76 PG16 5 Novar Drive 9/1/13 9/1/14 50 CO93 Tony Shaw
CR56 PG4 6 Lawrence Street 9/1/11 6/10/12 350 CO40 Tina Murphy
CR56 PG36 2 Manor Road 10/1/12 12/1/13 375 CO93 Tony Shaw
CR56 PG16 5 Novar Drive 11/1/14 8/10/15 450 CO93 Tony Shaw
cNo [PK] cName
CR76 John Kay
CR56 Aline Stewart
2NF -> 3NF [Also BCNF]2NF – 1NF Relation where Every Non-Primary Key Attribute is Fully Functionally Dependent on the PK [ER 1..1 to 1..1 Relations]3NF – 2NF Relation where no Non-PK Attribute is Transitively Dependent on a PK
Sam Siewert 13
cNo [PK] cName
CR76 John Kay
CR56 Aline Stewart
14.14 – 2NF (Still Suffers Update Hazard Due to Transitive Dependency pNo -> oNo -> oName)
cNo pNo start finish
CR76 PG4 7/1/12 8/31/13
CR76 PG16 9/1/13 9/1/14
CR56 PG4 9/1/11 6/10/12
CR56 PG36 10/1/12 12/1/13
CR56 PG16 11/1/14 8/10/15
pNo pAddress rent oNo oName
PG4 6 Lawrence Street 350 CO40 Tina Murphy
PG16 5 Novar Drive 450 CO93 Tony Shaw
PG36 2 Manor Road 375 CO93 Tony Shaw
oNo [PK] oName
CO40 Tina Murphy
CO93 Tony Shaw
pNo [PK] pAddress rent oNo [FK]
PG4 6 Lawrence Street 350 CO40
PG16 5 Novar Drive 450 CO93
PG36 2 Manor Road 375 CO93
Lossless-Join Property of 3NFFundamental Point – P. 425 – Lossless-Join Reversibility
3NF is a Process to Apply Relational Algebra Projections– Creates a Lossless-Join Decomposition (Reducing or Eliminating Insert,
Delete, Update Hazards)– Using Natural Join (in a View) we Can Reverse– View [Stored Query] Easily Regenerate 1NF Version– UNF Could Be Re-created Via Application Report Generation
1. Elimination of Repeating Groups -> 1NF
2. 1NF -> Every Non-PK [CK?] Attribute Fully Functionally Dependent on PK [any CK?] -> 2NF (No Partial Dependencies Allowed)
3. 2NF -> No Non-PK [CK?] is Transitively Dependent on the PK [any CK?] -> 3NF
Sam Siewert 14
Reminder – SK, CK, PK, AK, FKSK – Attribute of Set of Attributes that UNIQUELY identifies Tuple in Relation
CK – An SK, s.t. no Proper Subset is also an SK [Minimal]– UNIQUIENESS – CK uniquely Identifies all Tuples in Relation– IRREDUCIBILITY – No Proper Subset of CK has UNIQUENESS
PK – CK Selected to ID Tuples UNIQUELY in Relation
AK – CK Not Selected to be PK
FK – An Attribute or Set of Attributes in R1 that Matches CK in R1 or R2..N
Sam Siewert 15
Is 3NF Good Enough?Recall that PK Selection is From Set of CKs in Relation
Dependencies on Remaining CKs not Used as PK?
Strengthen 2NF and 3NF Definitions to Include ANY CK1. 1NF -> Every Non-CK Attribute Fully Functionally Dependent
on ANY CK -> 2NF (No Partial Dependencies Allowed)2. 2NF -> No Non-CK is Transitively Dependent on ANY CK ->
3NF
Even With STRONG 2NF & 3NF, Dependencies Can Still Cause Redundancy
BCNF Considers Common Cases
Sam Siewert 16
BCNFBCNF – Relation is BCNF If-and-only-if Every Determinant is a CK
Determinant – Attribute or Group of Attributes on Which some OTHER Attribute is Fully Functionally Dependent
3NF allows A -> B if B is PK and A is not CK
BCNF REQUIRES A to be CK [Further Constrains]– Issues Arise when Relation contains 2+ Composite CKs– CKs Overlap [Common Attribute]
Stopping at 3NF Preferred [Sometimes] to Avoid Loss of Dependencies
Sam Siewert 17
BCNF ExamplecNo intDate intTime staffNo roomNoCR76 5/13/14 10:30 SG5 G101CR56 5/13/14 12:00 SG5 G101CR74 5/13/14 12:00 SG37 G102CR56 7/1/14 10:30 SG5 G102
Sam Siewert 18
ClientInterview 3 Candidate Keys1. (cNo, intDate) - PK2. (staffNo, intDate, intTime) - CK3. (roomNo, intDate, intTime) – CK
intDate is Overlap between 3 CKs (creates Hazard)
(staffNo, intDate) determinant is not a CK for ClientIntervew
ClientInterview has following functional dependencies1. (cNo, intDate) -> intTime, staffNo, roomNo2. (staffNo, intDate, intTime) -> cNo3. (roomNo, intDate, intTime) -> staffNo, cNo4. staffNo, intDate -> roomNo
ClientInterview 3NF Relation
BCNF Example
cNo intDate intTime staffNoCR76 5/13/14 10:30 SG5CR56 5/13/14 12:00 SG5CR74 5/13/14 12:00 SG37CR56 7/1/14 10:30 SG5
Sam Siewert 19
Interview RelationstaffNo intDate roomNoSG5 5/13/14 G101SG37 5/13/14 G102SG5 7/1/14 G102
Interview Relation
cNo intDate intTime staffNo roomNoCR76 5/13/14 10:30 SG5 G101CR56 5/13/14 12:00 SG5 G101CR74 5/13/14 12:00 SG37 G102CR56 7/1/14 10:30 SG5 G102
ClientInterview 3NF Relation
Top Related