MySQL Session 1 dbms rdbms and normalization
-
Upload
ram-n-sangwan -
Category
Education
-
view
70 -
download
6
Transcript of MySQL Session 1 dbms rdbms and normalization
MySQL Training Session 1. Introduction to DBMS, RDBMS and Normalization
RAM N SANGWANWWW.THESKILLPEDIA.COM
www.rnsangwan.com
RDBMS Concepts Agenda
• Introduction to DBMS
• Introduction to RDBMS
• Normalization
Introduction to DBMS
www.rnsangwan.com
Objectives
By the end of this module we will learn:◦ Define Database management system◦ Drawbacks of file management system◦ What is DBMS◦ Usage of DBMS◦ Functionalities of database system◦ Data models
www.rnsangwan.com
DBMS: Definitions
Data: are known facts which are recorded with implicit meaning Database : it is a collection of logically related data at one place. Data base management system: A software to Manage the Database
www.rnsangwan.com
DBMS
Goals of a database management system◦ To provide an environment which will efficiently provide access to data in
database◦ Implement security, control in concurrency and recovery from crash.
It is a general purpose facility for:◦ To define database◦ To construct database◦ Manipulate database
www.rnsangwan.com
Drawbacks of file management system
Initially Data was stored in flat files (plain text files).
Software Developed in a Programming Language (e.g. COBOL) were used to implement the Business Logic.
Data is separated for every individual program
Data redundency.
No uniformity in the data
All the data would have been arranged as per the need of the specific program.
No security control.
www.rnsangwan.com
Benefits of database Approach
Then came DBMS (e.g. Foxpro, Foxbase, dBase III+ etc)
One Software to manage Data and Business Logic.
Redundancy reduced
Inconsistency avoided
Data is shared
Standard were enforced
Security was applied
Integrity was maintained
Data independency was provided
Database systemusers
Application software / queries
processing queries
Software to access storage data
metadata database
Database System (Contd.).
user query Q1
Database schema
Application program query Q2
Query processor
DDL compiler
Database manager
File manager
Physical databas
e
Compiled query
Q2Database
description
www.rnsangwan.com
Database system (Contd.).
Database system is of three types◦ Conceptual◦ Physical◦ Representational
www.rnsangwan.com
Database Architecture
Internal level(storage view)
Conceptual level(community user view)
External level(individual user views)
Database
www.rnsangwan.com
An example of three levels
ENO NAME F-NAME AGE SALARY DEPTNOCONCEPTUAL VIEW
ENO NAME F-NAME AGE
EXTERNAL VIEW
ENO SALARY DEPTNO
Struct Employee{Int eno;Char name(20);Char F-name(20);Float salary;Int denoStruct employee *ptr };
Internal view
www.rnsangwan.com
Schema
Schema: description of data in terms of a data model Three-level DB Architecture defines following schemas:
◦ External Schema (or sub-schema)◦ Writing using external DDL\
◦ Conceptual schema ◦ Writing using conceptual DDL
◦ Internal Schema◦ Writing using internal DDL or Storage structure definitional
www.rnsangwan.com
Types of Database Models
HIERARCHICAL
NETWORK
RELATIONAL
TABLEROW
COLUMN
VALUE
Introduction to RDBMS
www.rnsangwan.com
Objectives
By the end of this module we will learn
Definition: RDBMS
Features of an RDBMS
Some Important Terms
Properties of Relations
Keys
Keys and Referential Integrity
www.rnsangwan.com
Definition of RDBMS
Relational model◦ Dr. E.F.Codd described the relation model in 1970◦ He has put 12 rules describing the relational model◦ All the RDBMS applications should follow the rules
www.rnsangwan.com
Rules of Dr. E F codd.
Information rule◦ Database must consist of tables to each other and data should be stored in the form of a table only.
Guaranteed access rule ◦ The data can be accessed by specifying the table name and the column name and the columns that
defined the primary key . Primary key ensures that each value is unique and accessible.
Systematic treatment of NULL values◦ A null is a unknown value and every database must have a provision for storing NULL values. ◦ NULL != 0◦ NULL != nothing◦ NULL != NULL◦ NULL is NULL
www.rnsangwan.com
Rules of Dr. E F codd. (Contd.).
Dynamic on-line catalog based on the relational model◦ In this rule it says to have user data in tables and the information about the table such as structure
and other required information in tables only.◦ One as user tables and another as metadata this metadata is what called as system catalog.
Comprehensive sub language◦ There must be at least one language which supports data defining, view defining , manipulating
data, integrity constraints.◦ All the above should be supported with well defined syntax, as character strings.
www.rnsangwan.com
Rules of Dr. E F codd. (Contd.).
View updating rule◦ Should support a mechanism which designs different combinations of data from different tables
called as views. All the view should be updateable.◦ Views are virtual tables that contains extraction of data from the source tables◦ Either it is a simple view or a complex view it should allow updates to the view.
High level insert update and delete rule◦ Data manipulation operations treat Rows as a set. These set operations and relational operators
are used to work on table.◦ Use of insert, update and delete operations on views should act on their respective tables
www.rnsangwan.com
Rules of Dr. E F codd. (Contd.).
Physical data independence rule◦ Changes in the data storage should not effect programs that access the data.
Logical data independency rule ◦ The data is stored in the files physically, but tables are logical structures. So for making any
changes need in logical structures there is no need of any change in the application.
Integrity independency rule◦ Data integrity means the consistency and accuracy which keeps the garbage out of the database.◦ Integrity constraints must be the supported functionality of any database application.
www.rnsangwan.com
Rules of Dr. E F codd. (Contd.).
Distribution rule◦ To the end user all the commands should be working a same as it would working for the local
database even though the database is in some other place and user is accessing through network.
Non subversion rule ◦ All the constraints defined by the user using the SQL should not be bypased by any other way.
www.rnsangwan.com
Features of an RDBMS
The ability to create multiple relations (tables) and enter data into them
An interactive query language
Retrieval of information stored in more than one table
Provides a Catalog or Dictionary, which itself consists of tables (called system tables)
www.rnsangwan.com
Some Important Terms
Relation : a table Tuple : a row in a table Attribute : a Column in a table Degree : number of attributes Cardinality : number of tuples Primary Key : a unique identifier for the table Domain : a pool of values from which specific attributes of specific relations draw their values
www.rnsangwan.com
Relations or Tables properties
There are no duplicate rows (tuples) Tuples are unordered, top to bottom Attributes are unordered, left to right All attribute values are atomic ( or scalar ) Relational databases do not allow repeating groups
www.rnsangwan.com
Keys
Key Super Key Candidate Keys
◦ Primary Key◦ Alternate Key
Secondary Keys
www.rnsangwan.com
Keys and Referential Integrity
sid cid grade53666 carnatic101 C53688 reggae203 B53650 topology112 A53666 history105 B
sid name age
53666 Jones 18
53688 Smith 18
53650 Smith 19
gpa
3.4
3.2
3.8
login
Jones@cs
Smith@eecs
Smith@math
Enrolled Student
Primary keyForeign key referring tosid of STUDENT relation
Schema Refinement & Normalization
www.rnsangwan.com
• Normalization is performed to reduce or eliminate Insertion, Deletion or Update anomalies.
• However, a completely normalized database may not be the most efficient or effective implementation.
• “Denormalization” is sometimes used to improve efficiency.• Normalization splits database information across multiple tables.• To retrieve complete information from a normalized database, the JOIN
operation must be used.• JOIN tends to be expensive in terms of processing time, and very large joins
are very expensive.
Normalization
www.rnsangwan.com
Normalization -Contd..
Introduction
In this exercise we are looking at the optimisation of data structure. The example system we are going to use as a model is a database to keep track of employees of an organisation working on different projects.
Objectives
By the end of the exercise you should be able to:• Show understanding of why we normalize data• Give formal definitions of 1NF, 2NF & 3NF• Apply the process of normalization to your own project
www.rnsangwan.com
The data we would want to store could be expressed as:
Project No Project Name Employee No Employee Name
Rate category
Rate
1203 Alliance Softech Pvt Ltd Website
11 Amit Kumar A Rs. 250
12 Navneet Kaur B Rs. 200
16 Sudhir Singh C Rs. 175
1506 Online estate agency
11 Amit Kumar A Rs. 250
17 Deepa Nailwal B Rs. 200
The Scenario
www.rnsangwan.com
Three problems become apparent with our current model:
Tables in a RDBMS use a simple grid structureEach project has a set of employees so we can’t even use this format to enter data into a table. How would you construct a query to find the employees working on each project?
All tables in an RDBMS need a keyEach record in a RDBMS must have a unique identity. Which field should be the primary key?
Data entry should be kept to a minimumOur main problem is that each project contains repeating groups, which lead to redundancy and inconsistency.
Why Normalization?
www.rnsangwan.com
We could place the data into a table called:tblProjects_Employees
Project No. Project Name Employee No. Employee Name Rate category
Rate
1203 Alliance Softech Pvt Ltd Website
11 Amit Kumar A Rs. 250
1203 Alliance Softech Pvt Ltd Website
12 Navneet Kaur B Rs. 200
1203 Alliance Softech Pvt Ltd Wesite
16 Sudhir Singh C Rs. 175
1506 Online estate agency 11 Amit Kumar A Rs. 250
1506 Online estate agency 17 Deepa Nailwal B Rs. 175
Why Normalization? Contd..
www.rnsangwan.com
Why Normalization? Contd..
Addressing our three problems:
Tables in a RDBMS use a simple grid structure
We can find members of each project using a simple SQL or QBE search on either Project Number or Project Name
All tables in an RDBMS need a key
We CAN uniquely identify each record. Although no primary key exists we can use two or more fields to create a composite key.
Data entry should be kept to a minimum
Our main problem that each project contains repeating groups still remains. To create a RDBMS we have to eliminate these groups or sets.
www.rnsangwan.com
Why Normalization? Contd..
Did you notice that “Website” was misspelled in the 3rd record! Imagine trying to spot this error in thousands of records. By using this structure (flat filing) we create:
Redundant data
Duplicate copies of data – we would have to key in Alliance Softech Pvt Ltd Website 3 times. Not only do we waste storage space we risk creating Inconsistent data;
Inconsistent data
The more often we have to key in data the more likely we are to make mistakes.
www.rnsangwan.com
The solution is simply to take out the duplication. We do this by:
Identifying a keyIn this case we can use the project no and employee no to uniquely identify each row
Project No Employee No
1203 11
1203 12
1203 16
1506 11
1506 17
Unique Identifier120311
120312
120316
150611
150617
Normalization Process
www.rnsangwan.com
Field Project No Employee No
Project Name
Employee
Rate Category
Rate
We remove partial dependenciesThe fields listed are only dependent on part of the key so we remove them from the table.
Normalization Process Contd..
We look for partial dependencies
We look for fields that depend on only part of the key and not the entire key.
www.rnsangwan.com
We create new tablesClearly we can’t take the data out and leave it out of our database. We put it into a new table consisting of the field that has the partial dependency and the field it is dependent on. Looking at our example we will need to create two new tables:
Dependent On Partially Dependent
Project No Project Name
Dependent On Partially Dependent
Employee No Employee Name
Rate category
Rate
Normalization Process Contd..
www.rnsangwan.com
Project No Project Name
1203 Alliance Softech Pvt Ltd Website
1506 Online estate agency
Project No
Employee No
1203 11
1203 12
1203 16
1506 11
1506 17
We now have 3 tables:
Employee No
Employee Name
Rate Category
Rate
11 Amit Kumar A Rs. 250
12 Navneet Kaur
B Rs. 200
16 Sudhir Singh
C Rs. 175
17 Deepa Nailwal
A Rs. 200
tblProjectstblProjects_Employees
tblEmployees
Normalization Process Contd..
www.rnsangwan.com
Looking at the project, note the reduction in:
Redundant dataThe text “Alliance Softech Pvt Ltd Website” is stored once only, not for each occurrence of an employee working on the project.
Inconsistent dataBecause we only store the project name once we are less likely to enter “Webite”
The link is made through the key, Project No. Obviously there is no way to remove this duplication without losing the relation altogether, but it is far more efficient storing a short number repeatedly, than a large chunk of text.
Normalization Process Contd..
www.rnsangwan.com
Our model has improved but is still far from perfect. There is still room for inconsistency.
Employee No
Employee Name
Rate Category
Rate
11 Amit Kumar A Rs. 250
12 Navneet Kaur
B Rs. 200
16 Sudhir Singh
C Rs. 175
17 Deepa Nailwal
A Rs. 200
Deepa Nailwal is being paid Rs. 200 while Amit Kumar gets Rs. 250 – but they’re in the same rate category!
Again, we have stored redundant data: the hourly rate- rate category relationship is being stored in its entirety i.e. We have to key in both the rate category AND the hourly rate.
Normalization Process Contd..
www.rnsangwan.com
The solution, as before, is to remove this excess data to another table. We do this by:
Looking for Transitive RelationshipsRelationships where a non-key attribute is dependent on another non-key attribute. Hourly rate should depend on rate category BUT rate category is not a key.
Removing Transitive RelationshipsAs before we remove the redundant data and place it in a separate table. In this case we create a new table tblRates and add the fields rate category and hourly rate. We then delete hourly rate from the employees table.
Normalization Process Contd..
www.rnsangwan.com
Project No Project Name
1023 Alliance Softech Pvt Ltd Website
1056 Online estate agency
Project No
Employee No
1023 11
1023 12
1023 16
1056 11
1056 17
We now have 4 tables:
Employee No
Employee Name
Rate Category
11 Amit Kumar
A
12 Navneet Kaur
B
16 Sudhir Singh
C
17 Deepa Nailwal
A
tblProjectstblProjects_Employees
tblEmployees
Rate Category
Rate
A Rs. 250
B Rs. 200
C Rs. 175
tblRates
Normalization Process Contd..
www.rnsangwan.com
Again, we have cut down on redundancy and it is now impossible to assume Rate category A is associated with anything but Rs. 250.
Our model is now in its most efficient format with:
Minimum REDUNDANCY
Minimum INCONSISTENCY
Normalization Process Contd..
www.rnsangwan.com
What we have formally done is NORMALIZE the database:At the beginning we had a data structure:
Project No Project Name Employee No (1n) Employee name (1n) Rate Category (1n) Hourly Rate (1n)
(1n indicates there are many occurrences of the field – it is a repeating group).
To begin the normalization process we start by moving from zero normal form to 1st normal form.
Normalization Process Contd..
www.rnsangwan.com
The definition of 1st normal form;
There are no repeating groups All the key attributes are defined All attributes are dependent on the primary key
So far, we have no keys, and there are repeating groups. So we remove the repeating groups and define the keys and are left with:
Employee Project table Project number – part of key Project name Employee number – part of key Employee name Rate category Hourly rateThis table is in first normal form (1NF)
1st Normal Form
www.rnsangwan.com
A table is in 2nd normal form if:
It’s already in first normal form It includes no partial dependencies (where an attribute is dependent on only part of the
key)
We look through the fields: Project name is dependent only on project number Employee name, rate category and hourly rate are dependent only on employee
number.
So we remove them, and place these fields in a separate table, with the key being that part of the original key they are dependent on. We are left with the following three tables:
2nd Normal Form
www.rnsangwan.com
Employee Project table Project number – part of key Employee number – part of key
Employee table Employee number - primary key Employee name Rate category Hourly rate
Project table Project number - primary key Project name
The tables are now in 2nd normal form (2NF). Are they in 3rd normal form?
2nd Normal Form-Contd..
www.rnsangwan.com
A table is in 3rd normal form if:
It’s already in second normal form It includes no transitive dependencies (where a non-key attribute is dependent on
another non-key attribute)
We can narrow our search down to the Employee table, which is the only one with more than one non-key attribute. Employee name is not dependent on either Rate category or Hourly rate, the same applies to Rate category, but Hourly rate is dependent on Rate category.
So, as before, we remove it, placing it in it's own table, with the attribute it was dependent on as key.
3rd Normal Form
www.rnsangwan.com
Employee project table Project number – part of key Employee number – part of key
Employee table Employee number - primary key Employee name Rate Category
Rate table Rate category - primary key Hourly rate
Project Table Project number - primary key Project nameThese tables are all now in 3rd normal form, and ready to be implemented.
3rd Normal Form Contd..
www.rnsangwan.com
Boyce-Codd Normal Form (BCNF)• Strengthens 3NF by requiring the keys in the functional dependencies to be superkeys (a
column or columns that uniquely identify a row)
A 3NF relation is NOT in BCNF if:• Candidate keys in the relation are composite keys (they are not single attributes)• There is more than one candidate key in the relation, and• The keys are not disjoint, that is, some attributes in the keys are common
Other Normal Forms
www.rnsangwan.com
What normal form is this table in? Giving it a quick glance, we see:
• no repeating groups, and a primary key defined, so it's at least in 1st normal form. • There's only one key, so we needn't even look for partial dependencies, so it's at least in
2nd normal form. • How about transitive dependencies? Well, it looks like Town might be determined by
Postcode. And in most parts of the world that's usually the case.
So we should remove Town, and place it in a separate table, with Postcode as the key?
Summary
www.rnsangwan.com
No! Although this table is not technically in 3rd normal form, removing this information is not worth it. Creating more tables increases the load slightly, slowing processing down. This is often counteracted by the reduction in table sizes, and redundant data.
But in this case, where the town would almost always be referenced as part of the address, it isn't worth it. Perhaps a company that uses the data to produce regular mailing lists of thousands of customers should normalize fully. It always comes down to how the data is going to be used.
Normalization is just a helpful process that usually results in the most efficient table structure, and not a rule for database design.
Summary Contd..
Thank YouVISIT WWW.THESKILLPEDIA.COM