Post on 18-Jan-2016
44271: Database Design & Implementation
Logical Data Modelling(Avoiding Database Anomalies)
Ian PerryRoom: C49 Tel Ext.: 7287
E-mail: I.P.Perry@hull.ac.uk
http://itsy.co.uk/ac/0405/sem3/44271_DDI/
Ian Perry Slide 244271: Database Design & Implementation: Logical Data Modelling
What is a Logical Data Model? A ‘robust’ representation of the initial
decisions made when building our Conceptual Data Model, which was composed of: Entities Attributes Relationships
When I say ‘robust’ I mean that this model MUST ‘perform’ well with respect to a specific style/type of software.
Ian Perry Slide 344271: Database Design & Implementation: Logical Data Modelling
Database Theories & Software Hardware independent, the match to
‘type’ of software is only concern, e.g.: Hierarchical DBMS Relational DBMS Object-based DBMS
Each Database Theory addresses: Data Structure Data Integrity Data Manipulation
Ian Perry Slide 444271: Database Design & Implementation: Logical Data Modelling
Database Theory = Relational Model First proposed by Dr. E. F. Codd in June 1970.
Codd E F, (1970), A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, Vol. 13, No. 6, Pgs 377 – 387.
Codd's model is now accepted as the definitive model for relational database management systems (RDBMS).
Structured English QUEry Language ("SEQUEL") was developed by IBM Corporation, Inc., to use Codd's model.
SEQUEL later became SQL. In 1979, Relational Software, Inc. (now Oracle
Corporation) introduced the first commercial implementation of SQL.
SQL is the most widely used RDBMS manipulation language.
Ian Perry Slide 544271: Database Design & Implementation: Logical Data Modelling
Relations look like Entities, but …
Entity Staff(SCode, Name, Address, DoB, …)
May discover requirement for ‘extra’ Attributes, and also need to ‘complete’ our list of Attributes for each Relation.
Relation Staff(SCode, Name, Address, DoB, DoE)
Entity Contract(CCode, Site, Begin, End, …)
Can’t draw relationship lines, so need to ‘add’ extra attributes to Relations at the ‘M’ end of any ‘1:M’ relationships; e.g. 1 Staff “take part in” M Contract.
Relation Contract(CCode, Site, Begin, End, SCode)
Ian Perry Slide 644271: Database Design & Implementation: Logical Data Modelling
Use Tables to ‘flesh-out’ your Logical Model
Staff(SCode, Name, Address, DoB, DoE)SCode Name Address DoB DoE 9491 Smith 6 Shaw St 13/02/65 03/10/98 7416 Day 2 Sale St 14/01/57 22/11/02 8912 Jones 15 Ayr Av 28/12/76 01/03/04
CCode Site Begin End SCode 279 Hull 27/02/05 03/03/05 9491 665 York 14/09/04 02/12/04 7416 183 York 04/03/05 16/06/05 9491
Contract (CCode, Site, Begin, End, SCode)
NB. Tables ARE NOT Relations!
Ian Perry Slide 744271: Database Design & Implementation: Logical Data Modelling
Primary & Foreign Keys Most important Attributes in a Relation are
know as ‘Keys’: of which there are two types.
Primary Key: One, or more, Attribute(s) that identify a
unique occurrence of the ‘Entity’ that this ‘Relation’ represents.
Foreign Key: Attributes used (i.e. instead of the lines of an
ER Diagram) to represent the presence of relationships.
Often referred to as: The Primary/Foreign Key Mechanism.
Ian Perry Slide 844271: Database Design & Implementation: Logical Data Modelling
Attributes, Domains & Relationships
Attribute Values should be atomic (i.e. simple/single values only); e.g.: ‘address’ should be separated into ‘street’ &
‘town’ & ‘postcode’. Set of eligible Attribute Values is known as
an Attribute’s Domain; e.g.: if we only have 100 members of staff, then
the Domain of the ‘SCode’ Attribute could be “whole numbers between 1 & 100”.
The Relational Model is weak at explicitly modelling relationships: Attributes in different Relations MUST
HAVE same Attribute Domain for relationship to be possible.
Ian Perry Slide 944271: Database Design & Implementation: Logical Data Modelling
Codd’s Rules Each Tuple (i.e. row) MUST BE unique,
i.e.: need a way to discriminate between Tuples.
Therefore: each Relation MUST HAVE a Primary Key.
There may be many Candidates for the job of Primary Key, so select on basis of: uniqueness AND/OR minimality.
Keys with more than one Attribute: are know as composite keys.
Ian Perry Slide 1044271: Database Design & Implementation: Logical Data Modelling
Rules for Integrity No Attribute that is part of the Primary
Key can assume a ‘null’ value, else: how could we discriminate between
Tuples?
Foreign Key Attributes must take values that are either ‘null’, or from same Domain as the Primary Key Attribute to which they are logically linked, else: we will lose the possibility of making
relationships.
Ian Perry Slide 1144271: Database Design & Implementation: Logical Data Modelling
Avoiding Database Anomalies Most Database books have a section
describing a mathematically-based technique called Normalisation: I will show you a much easier way of
achieving the same result. What we want to achieve is a ‘robust’
Logical Data Model; i.e. by: Transforming a Conceptual Data Model into
a set of Relations. Checking these Relations for any
Anomalies. Documenting them as a Database Schema.
Ian Perry Slide 1244271: Database Design & Implementation: Logical Data Modelling
What is an Anomaly? Anything we try to do with a database
that may lead to unexpected and/or unpredictable results.
Three types of Anomaly; i.e.: insert delete update
Need to check your database design carefully: the only good database is an anomaly
free database.
Ian Perry Slide 1344271: Database Design & Implementation: Logical Data Modelling
Insert Anomaly When we want to enter a value into a data
cell but the attempt is prevented, as the primary key value is not known.
e.g. We have built a new Room (e.g. B123), but it has not yet been timetabled for any courses (so we don’t have a CoNo value).
CoNo Tutor Room RSize EnLimit 353 Smith A532 45 40 351 Smith C320 100 60 355 Clark H940 400 300 456 Turner H940 400 45
Ian Perry Slide 1444271: Database Design & Implementation: Logical Data Modelling
Delete Anomaly When a value we want to delete also means
we will delete values we wish to keep.
CoNo Tutor Room RSize EnLimit 353 Smith A532 45 40 351 Smith C320 100 60 355 Clark H940 400 300 456 Turner H940 400 45
e.g. CoNo 351 has ended, but Room C320 will be used elsewhere.
Ian Perry Slide 1544271: Database Design & Implementation: Logical Data Modelling
Update Anomaly When we want to change a single data item
value, but must update multiple entries
CoNo Tutor Room RSize EnLimit 353 Smith A532 45 40 351 Smith C320 100 60 355 Clark H940 400 300 456 Turner H940 400 45
e.g. Room H940 has been improved, it is now of RSize = 500.
Ian Perry Slide 1644271: Database Design & Implementation: Logical Data Modelling
Conceptual Model & Translation Process
Conceptual Model:
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB, ...)
Student(Enrol-No, Name, Address, OLevelPoints, ...)
Course(CourseCode, Name, Duration, ...)
Staff Course Student1 MM M
Translation Process: Entities become Relations Attributes become Attributes(?) Key Attribute(s) become Primary Key(s) Relationships are represented by additional Foreign Key
Attributes; for those Relations that are at the ‘M’ end of each 1:M
Relationship.
Ian Perry Slide 1744271: Database Design & Implementation: Logical Data Modelling
The ‘Staff’ & ‘Student’ Relations
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB, ...)
becomes:
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)
Student(Enrol-No, Name, Address, OLevelPoints, ...)
becomes:
Student(Enrol-No, Name, Address, OLevelPoints, Tutor)
NB. Foreign Key Tutor references Staff.Staff-ID
Ian Perry Slide 1844271: Database Design & Implementation: Logical Data Modelling
The ‘Staff’ & ‘Course’ Relations
Course(CourseCode, Name, Duration, ...)
becomes:
Course(CourseCode, Name, Duration)
NB. Can’t ‘simply’ add extra attributes to act as Foreign Keys; as BOTH Relations have a ‘M’ end: I warned you about leaving M:M
relationships in your Conceptual Data Model.
MUST create an ‘artificial’ linking Relation.
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)
Ian Perry Slide 1944271: Database Design & Implementation: Logical Data Modelling
‘Staff’, ‘Course’ & ‘Team’ Relations
NB.In the ‘artificial’ Team Relation:Primary Key is a ‘composite’ of CourseCode &
Staff-IDForeign Key CourseCode references
Course.CourseCodeForeign Key Staff-ID references Staff.Staff-ID
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)
Course(CourseCode, Name, Duration)
Team(Staff-ID, CourseCode)
Ian Perry Slide 2044271: Database Design & Implementation: Logical Data Modelling
4 Relations from 3 Entities?
OK, BUT are they anomaly free?• Is every Tuple unique?
• i.e. is there a Primary Key.• Are the Attributes Atomic?
• i.e. do they store only ONE item of data.
• Does every Attribute within each Relation ‘depend’ upon the Primary Key?
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)
Course(CourseCode, Name, Duration)
Team(Staff-ID, CourseCode)
Student(Enrol-No, Name, Address, OLevelPoints, Tutor)
Ian Perry Slide 2144271: Database Design & Implementation: Logical Data Modelling
What if the checks fail? If any Relation fails ‘checks’:
especially those checking dependency. we MUST split that Relation into
multiple Relations: until they pass the tests.
but MUST remember to leave behind a Foreign Key: to ‘point’ forwards to the Primary Key of
the ‘new’ split-off Relation.
Ian Perry Slide 2244271: Database Design & Implementation: Logical Data Modelling
Are they Anomaly Free?
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)
Course(CourseCode, Name, Duration)
Team(Staff-ID, CourseCode)
Student(Enrol-No, Name, Address, OLevelPoints, Tutor)
NOT Dependentupon Staff-ID;Requires a slightlymore complex ‘solution’.
NOT very Atomic;Could easily be split into ‘Street’, ‘Town’ & ‘PostCode’.
Ian Perry Slide 2344271: Database Design & Implementation: Logical Data Modelling
‘Fixing’ the Dependency ‘Problem’
The Attribute ‘RateOfPay’ depends upon ‘ScalePoint’ NOT ‘Staff-ID’. So, we MUST remove ‘RateOfPay’ from the
‘Staff’ Relation, like this:
NB. In the ‘Staff’ Relation:Foreign Key ScalePoint references
Pay.ScalePoint
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)
Staff(Staff-ID, Name, ScalePoint, DOB)
Pay(ScalePoint, RateOfPay)
Ian Perry Slide 2444271: Database Design & Implementation: Logical Data Modelling
5 Relations from 3 Entities
Now all we need to do: Is to document our ‘Anomaly Free’
Relations as a Database Schema.
Staff(Staff-ID, Name, ScalePoint, DOB)
Course(CourseCode, Name, Duration)
Team(Staff-ID, CourseCode)
Student(Enrol-No, Name, Street, Town, PostCode,
OLevelPoints, Tutor)
Pay(ScalePoint, RateOfPay)
Ian Perry Slide 2544271: Database Design & Implementation: Logical Data Modelling
Document Relations as a Database Schema
A Database Schema: defines all Relations, lists all Attributes (with
their Domains), and identifies all Primary & Foreign Keys.
We may/should have ‘discovered’ a number of constraints during our analysis of the Business situation, e.g: the College only delivers 10 Courses. there are only 12 Points on the Pay Scale. Staff MUST be at least 21 Years Old.
These constraints can/should be expressed as the ‘Domains’ of the Database Schema.
Ian Perry Slide 2644271: Database Design & Implementation: Logical Data Modelling
Logical Schema 1 - Domains Schema College Domains
StudentIdentifiers = 1 - 9999; StaffIdentifiers = 1001 - 1199; GeneralNames = TextString (15 Characters); Addresses = TextString (20 Characters); PostCodes = TextString (7 or 8 Characters); CourseIdentifiers = 101 - 110; OLevelPoints = 0 - 100; ScalePoints = 1 - 12; StaffBirthDates = Date (dd/mm/yyyy), >21
Years before Today;
Ian Perry Slide 2744271: Database Design & Implementation: Logical Data Modelling
Logical Schema 2 - Relations Relation Student
Enrol-No: StudentIdentifiers; Name: GeneralNames; Street: Addresses; Town: Addresses; PostCode: PostCodes; OLevelPoints: OLevelPoints; Tutor: StaffIdentifiers;
Primary Key: Enrol-No Foreign Key Tutor references Staff.Staff-
ID
Ian Perry Slide 2844271: Database Design & Implementation: Logical Data Modelling
Logical Schema 3 - Relations Relation Staff
Staff-ID: StaffIdentifiers; Name: GeneralNames; ScalePoint: ScalePoints; DOB: StaffBirthDates;
Primary Key: Staff-ID Foreign Key ScalePoint references
Pay.ScalePoint
Ian Perry Slide 2944271: Database Design & Implementation: Logical Data Modelling
Logical Schema ... Relation Course
CourseCode: CourseIdentifiers; Name: GeneralNames; … etc.
Continue to define each of the Relations in a similar manner.
NB. Make sure that you define ALL of the Relations, including: ‘artificial’ ones (e.g. Team) ‘split-off’ ones (e.g. Pay)
Ian Perry Slide 3044271: Database Design & Implementation: Logical Data Modelling
This Week’s Workshop The purpose of this week’s Workshop
is to practice developing ‘robust’ logical data models that conform to the ‘rules’ of Codd’s Relational Model. Exploring the ‘definition’ of Relations. Identifying potential anomalies in a
Table of data, and ‘solving’ these ‘problems’.
Documenting a Database Schema (i.e. a Logical Model), in the format required by Part 2 of the Assignment.