Supplement 01 (b)Database Introduction-2 HSQ - DATABASES & SQL And Franchise Colleges By MANSHA...
-
Upload
edith-carpenter -
Category
Documents
-
view
218 -
download
0
Transcript of Supplement 01 (b)Database Introduction-2 HSQ - DATABASES & SQL And Franchise Colleges By MANSHA...
Supplement 01 (b) Database Introduction-2
HSQ - DATABASES & SQL
And Franchise Colleges
By MANSHA NAWAZ
Supplement 01(b) Database-Introduction-2
Supplement 01 (b) Database Introduction-2
The Database Concept (4GL TOOLS & TECHNIQUES)
– This does not imply a computer system.– Is a well organised filing cabinet a database by this definition?
• A Database Management System (DBMS)– A sophisticated software development package capable of handling a systems
database stored needs.
• We are particularly interested in Relational Database Management Systems in this module (RDBMS).
• A Database
A data base is a collection on non-redundant data shareable
between different application systems. [Howe, D.R. 1989]
Supplement 01 (b) Database Introduction-2
Application Systems Sharing Data
• The Applications (Application Systems / Programs etc.– The Admission System uses patient and medical staff data.– The Operation Scheduling Report uses operating theatre, patient and medical staff
data.– The Medical Staff Report uses medical staff data.
• The application systems share data.
AdmissionSystem
Operation Scheduling
Report
Medical Staff Report
Database
Supplement 01 (b) Database Introduction-2
Data Models• A Data Model provides a particular way of thinking about data, at least in terms of its structure
Data Models include data descriptions, data relationships, data semantics, consistency constraints [Silberschatz et. al.]
Data model comprises three components: data structures, data manipulators, and general integrity rules [Codd (1970)]
• There are many types of data model.– Hierarchic - Network - Relational - Multidimensional
– Entity-Relationship - Object-Oriented - Multimedia - etc.
• Each database uses a definition language– imposes restrictions on
• what can be defined• how entities relate to each other
• In this module we are interested mainly in the Relational and Entity-Relationship data models.
• Why? The principles involved in Entity Relationship Modelling apply to all data models
Supplement 01 (b) Database Introduction-2
The Relational Model
Relational model first proposed in 1970 by Dr E F (Ted) Codd in the paper ‘A relational model of data for large shared data banks’.
– Achieve program/data independence by treat data in a disciplined way
– Has a mathematical basis – the term “relation” comes from this• Apply rigour of mathematics
• Use set theory
• Determining data structure.
– Data is stored in a structure of relations (tables) defined by a data definition language (DDL).
– The elements of data structure used in relational models are relations (tables), attributes, tuples (rows), and domains. [Rolland p39-44]
• Defining data integrity.
– Data integrity means that data remains stable, secure, and accurate.
– It is maintained by internal constraints known as integrity rules that are invisible to users. [Rolland p45-48]
Supplement 01 (b) Database Introduction-2
The Relational Database
• A relational database is made up of relations (tables) in which data are stored.
• A relation (table) is a 2-dimensional structure made up of attributes (columns) and tuples (rows).
Relation
• A relation is a table that obeys the following rules:– There are no duplicate rows in the table.– The order of the rows is immaterial.– The order of the columns is immaterial.– Each attribute value is atomic, ie each cell can contain one and only one data value.
Supplement 01 (b) Database Introduction-2
ANIMAL
ANAME AFAMILY WEIGHT
Candice Camel 1800
Zona Zebra 900
Sam Snake 5
Elmer Elephant 5000
Leonard Lion 1200
Example of a Table (Relation)
• Relations can be manipulated and changed using a data manipulation language (DML) that employs relational operators.
– These operators are based on the concepts of relational algebra.
• Information is represented as two dimensional tables as below.
ANIMAL TABLE
Supplement 01 (b) Database Introduction-2
Tables and Keys – for relationships
• A primary key is a unique identifier for each row in a table. Can consist of one or more columns. Each table contains data about one entity.
• A foreign key is a column or columns in one table which reference(s) a primary key column or columns in another table.
• Values in a foreign key must match an existing value in the primary key or be NULL. This is known as the referential integrity rule.
• ANO in the ANIMAL-FOOD table is part of the primary key and also a foreign key.
ANIMAL
ANO ANAME AFAMILY WEIGHT
CA1 Candice Camel 1800
ZE4 Zona Zebra 900
SN1 Sam Snake 5
EL3 Elmer Elephant 5000
LI2 Leonard Lion 1200
ANIMAL-FOOD
ANO FOOD
CA1 Hay
CA1 Buns
ZE4 Brush
SN1 Mice
SN1 People
EL3 Leaves
LI2 People
LI2 Meat
Supplement 01 (b) Database Introduction-2
Relational Database Terminology
• Relation a table with rows and columns• Tuple a row of a relation• Attribute a named column of a relation• Primary key a unique identifier for each row in a relation• Domain the set of allowable values for a column• Degree the number of columns in a relation• Cardinality the number of rows in a relation
Supplement 01 (b) Database Introduction-2
Flight Table ExampleFlight: Flight# Origin Destination Arrival
BA143 NAP ROM 10.15 BA142 ROM NAP 10.22 KT222 LHR JFK 10.34 KT401 JFK DUL 10.45 KT402 DUL JFK 10.54 KT111 CCG MIA 11.06 KT112 MIA CCG 11.11 DE477 ATH CDG 11.34 DE478 CDG ATH 11.56 BA101 EDI LHR 12.04 BA102 LHR EDI 12.33
Table type usually written as follows: Flight: (Flight#, Origin, Destination, Arrival)
Supplement 01 (b) Database Introduction-2
Flight# Origin Destination Arrival BA143 NAP ROM 10.15 BA142 ROM NAP 10.22 KT222 LHR JFK 10.34 KT401 JFK DUL 10.45 KT402 DUL JFK 10.54 KT111 CCG MIA 11.06 KT112 MIA CCG 11.11 DE477 ATH CDG 11.34 DE478 CDG ATH 11.56 BA101 EDI LHR 12.04 BA102 LHR EDI 12.33
Domain = set of values drawn upon by a particular attributeCardinality = No. of Rows in a relationDegree = No. of Columns in a relation
Flight:
Attribute Value, Data or Value
Column, Attribute, Type or Field
Q: Which is most likely to change?
Tuple, Row or Record
Key Field
Intension or
table type
Relational Database Terminology
Supplement 01 (b) Database Introduction-2
Your turn – define the following terms!!
Client# Part# Qty-ordered DateC1 P1 25 22/08/99C2 P1 12 24/08/99C2 P2 8 24/08/99C2 P3 4 24/08/99C3 P2 3 25/08/99C4 P1 20 25/08/99
Order:
Value Row ColumnDomain Degree of Order IntensionAttribute type Cardinality of Order RecordAttribute Value Table Type FieldTuple Table Occurrence AttributeExtension Relation
Supplement 01 (b) Database Introduction-2
Shop: Shop ID Area-code Location1 2 Edinburgh2 1 London3 1 London4 2 Edinburgh5 3 Birmingham6 4 Ipswich7 1 London
Why Tables ?
Supplement 01 (b) Database Introduction-2
Primary Key Data Duplication
Shop: Shop ID Area-code 1 2 2 1 3 1 4 2 5 3 6 4 7 1
Area: Area-code Location 2 Edinburgh 1 London 3 Birmingham 4 Ipswich
Normalisation is the process used to make sure tables are non-redundant
Duplication of primary key data maintains the link (relationships) between tables (data)
Supplement 01 (b) Database Introduction-2
Keys within the Relational Data Model
• The Primary Key is the field which uniquely identifies a record• The Primary Key (Unique Identifier) concept:
Example :
student (student# , name , …) (the # character means 'number’)
• If student# is the primary key then a particular student#, e.g. 'S4', can only occur once in that column of the table.
student# name S4 Ramesh S2 Peter S9 Anthony
S11 Priti
Supplement 01 (b) Database Introduction-2
• The Primary Key uniquely identifies a student and thus that student can only have one row in the student table.
• Breaking that rule ….
student# name S4 Ramesh S2 Peter S9 Anthony
S11 Priti S4 Fred
• The new row in the table does not make much sense.– The first row for 'S4' is sufficient to hold the name and we cannot allow a
second row with 'S4' as the primary key value.• Why?
Supplement 01 (b) Database Introduction-2
• Example
TASK (employee#, project#, role, supervisor, ours_allocated,
hours-so-far, hours-required , …)
employee# project# role supervisor hours-allocated hours-so-far hours-required
E2 P9 program E123 120 85 100
E2 P4 design E101 300 250 200
E101 P9 design E101 60 128 56
E22 P11 test E345 40 0 40
• This table has a composite primary key. – The primary key is composed of the three attributes
[employee#, project# , role].
• In each row, the composite of the values for these attributes must be unique. – For example, the first row has the values ['E2, P9, program'] for these
attributes. – No other row is allowed to have the same combination.
Supplement 01 (b) Database Introduction-2
employee# project# role supervisor hours-allocated hours-so-far
hours-required
E2 P9 program E123 120 85 100
E2 P4 design E101 300 250 200
E101 P9 design E101 60 128 56
E22 P11 test E345 40 0 40
E2 P9 program E99 12 0 2000
• Breaking that rule …
• Again this makes no sense or the basic design is wrong– Perhaps an employee can be re-allocated to a project, with a different
supervisor, to do more programming. – In this case the chosen primary key is wrong and needs the addition of an
extra attribute such as date.
• For example, [employee#, project# , role, start_date] might be an appropriate identifier.
Supplement 01 (b) Database Introduction-2
Foreign Keys (Posted Identifier) concept
• Example:
student(student#, course#, student_name, …)
course (course#. course_name, …)
• course# is the identifier (primary key) of the course table.
• The course# is posted into the student table and is thus called a FOREIGN KEY (or posted identifier).
– Now for any student we can easily find the appropriate course# and look up futher details of that course in the course table is needed.
– This is easy to do in any relational database.
Supplement 01 (b) Database Introduction-2
Candidate Keys (Candidate Identifier)
• Both National_Insurance# and emp# can be primary keys (unique identifiers) of employee.
• You choose one as the most appropriate from the two candidate (possible) keys.
• You could argue that the composite [name, address] is another candidate primary key.
– A favourite example
• Employee(emp#, name, address, National_Insurance#, ..)
Or
• Employee(National_Insurance# , name, address, emp#,..)
Supplement 01 (b) Database Introduction-2
Shop:
Area:
Shop ID Area-code1 22 13 14 25 36 47 1
Area-code Location2 Edinburgh1 London3 Birmingham4 Ipswich
Examples - Primary and Foreign keys
Client# Part# Qty-ordered Date Client# Part# Qty-ordered Date
C1 P1 25 22/08/99 C2 P1 12 24/08/99 C2 P2 8 24/08/99 C2 P3 4 24/08/99 C3 P2 3 25/08/99 C4 P1 20 25/08/99
Visit# Patient# Doctor# Date V1 P1 D1 22/08/99 V2 P1 D1 24/08/99 V3 P2 D1 24/08/99 V4 P3 D1 24/08/99 V5 P1 D2 25/08/99
Order:
Visit:
Supplement 01 (b) Database Introduction-2
Emp# Emp_name Dept# Status E1 Fred Brown Mtg Manager E2 Eve Munsen R&D Manager E3 Joyce Goldberg Admin G1 E4 Paul Samuels Mtg G4 E5 Paul Josephs R&D G3 E6 Terry Wain Production Manager
Man# Women# Date Man# Women# Date P1 P6 22/04/94 P2 P7 23/08/95 P2 P8 24/04/97 P3 P9 2/01/99 P4 P10 5/07/99 P5 P8 5/08/99
Employee:
Marriage:
Examples - Primary and Foreign keys
Supplement 01 (b) Database Introduction-2
Anatomy of a table - a reminder A Table Occurrence: using a variation of the Library Copies table
access_no isbnx price now_price condition times-loaned
4,887,642 0-7131-3688-X £12.95 02.06.92 A2 4
4,887,657 0-7131-3688-X £12.95 17.09.91 B1 47
6,055,432 0-7248-1045-5 £37.65 12.04.92 A2 17
9,387,263 0-6542-1212-B £15.99 14.02.91 B2 37
7,365,241 0-2435-3468-V £27.40 19.11.91 A3 7
3,874,652 0-2435-3468-V £27.40 19.11.91 A1 11
Attribute: Example: date-purchased
Value: Example: 02.06.92
Table Example: COPIES(access_no, isbnx, price, now_price, condition, times-loaned)
– What is special about isbnx in this table?
Supplement 01 (b) Database Introduction-2
The 4 Rules for Normalised Tables [Rolland p72-]
• No row order significance.
• No column order significance.
• No multiple values at row/column intersections.
• No duplicate rows.
• Snapshots of table occurrences.– When we look at a paper copy of a table remember that the data in a real database
table can be expected to change all the time.
– The COPIES table could have 5 rows the first time we look and on another day there could be hundreds or thousands.
– Always assume a database table really has thousands of rows.
Supplement 01 (b) Database Introduction-2
The 4 rules for Normalised Tables broken
• No row order significance (Rule broken).
access_no isbnx price now_price condition times-loaned
4,887,642 0-7131-3688-X £12.95 02.06.92 A2 4
4,887,657 17.09.91 B1 47
6,055,432 0-7248-1045-5 £37.65 12.04.92 A2 17
9,387,263 0-6542-1212-B £15.99 14.02.91 B2 37
7,365,241 0-2435-3468-V £27.40 19.11.91 A3 7
3,874,652 19.11.91 A1 11
• If you swap the rows you lose information as copies are sometimes dependent on the row above for their ISBNX number.
Supplement 01 (b) Database Introduction-2
No Column Order Significance (Rule broken).
• The two columns with no attribute type shown are intended to indicate the date-purchased followed by the date-removed.
– The date-removed column would contain a significant number of NULLS (explained later).
• The two columns are now dependent on the column order for their meaning. – If you move the first date column to the end of the table then the meaning is lost.
• Clearly having each column with its own attribute type is simpler and makes the columns order independent of each other.
access_no isbnx price condition times-loaned
4,887,642 0-7131-3688-X £12.95 02.06.92 A2 4
4,887,657 0-7131-3688-X £12.95 17.09.91 28.07.93 B1 47
6,055,432 0-7248-1045-5 £37.65 12.04.92 A2 17
9,387,263 0-6542-1212-B £15.99 14.02.91 31.08.93 B2 37
7,365,241 0-2435-3468-V £27.40 19.11.91 A3 7
3,874,652 0-2435-3468-V £27.40 19.11.91 A1 11
Supplement 01 (b) Database Introduction-2
No Multiple Values at Row/column Intersections (Rule Broken).
• Or No Repeating Groups
• This is just too complicated – it makes searching and sorting difficult.
• Can you sort this into date order??
• Can it be easily searched on access_no ??
access_no isbnx price date condition times-loaned
4887642,4887657
0-7131-3688-X £12.95 02.06.92,17.09.91
A2, B1 4, 47
6,055,432 0-7248-1045-5 £37.65 12.04.92 A2 17
9,387,263 0-6542-1212-B £15.99 14.02.91 B2 37
7365241,3874652
0-2435-3468-V £27.40 19.11.91,19.11.91
A3, A1 7, 11
Supplement 01 (b) Database Introduction-2
access_no isbnx price date condition times-loaned
4,887,642 0-7131-3688-X £12.95 02.06.92 A2 4
4,887,657 0-7131-3688-X £12.95 17.09.91 B1 47
6,055,432 0-7248-1045-5 £37.65 12.04.92 A2 17
9,387,263 0-6542-1212-B £15.99 14.02.91 B2 37
7,365,241 0-2435-3468-V £27.40 19.11.91 A3 7
3,874,652 0-2435-3468-V £27.40 19.11.91 A1 11
4,887,642 0-7131-3688-X £12.95 02.06.92 A2 4
No Duplicate Rows (Rule broken).
• Why is this such a BAD idea?
• If you allow redundantly duplicated data in a real system, what will be the end result?
Supplement 01 (b) Database Introduction-2
Domains
• An attribute cannot contain just any data. – For example we could have the attribute student_date_of_birth– Whilst '11-January-1980' might be a suitable value, 'FRED BLOGGS'
clearly is not - the wrong data type.
• So any value of student_date_of_birth at least should be a valid date.
– Other rules might apply to the attribute - dates before the year 1900 seem unlikely to be useful etc.
• The Domain concept carries this a bit further. – A Domain is the pool of values from which an attribute draws its actual
values.
Supplement 01 (b) Database Introduction-2
• However, 'X23Y B&&9' is also a string but is not a student name.(unless that student has particularly annoying parents).
• We could argue that there the finite list of possible student names is a subset of all possible random strings. – We can't predict what a student will be called.– Thus we have to implement the domain of student_name by using the
data type string.
• In other cases, for example the attribute, student_eye_colour, we could easily define a list of values that defines the full range of possibilities.
• We may say that the attribute student_name is of data type string. – So 'JOE BLOGGS' is a valid value.
Supplement 01 (b) Database Introduction-2
NULLS
• There are two basic types of NULL value 'not applicable' and 'not known'.
emp# emp-name age car-reg#
E4 D.Jones 34 F345DRT
E77 L.Smith 27
E9 J.Smith G467BBT
E2 N.Patel 55 K976BJT
• Every employee would have an age but it might not be known in a particular case.
• However, not every employee need own a car ( with a car-reg#).
Supplement 01 (b) Database Introduction-2
Summary
• Introduction to Database Concepts
• Introduction to Data Modelling
• Introduction to Databases and Redundancy.
• Relational Data Model
• Terminology associated with Relational Data Model.
• Duplicated Data
• Primary and Foreign Keys
• Duplicated Data (Foreign key to Primary key references to link data in tables)
but not redundant.
• Additional supplementary material on Normalisation available
Supplement 01 (b) Database Introduction-2
END OF LECTURE