Overview of RDBMS Technology
-
Upload
api-3706175 -
Category
Documents
-
view
113 -
download
1
Transcript of Overview of RDBMS Technology
Overview of Relational Database TechnologyHanu
2
Introductions
3
Agenda
• Introduction to files systems
• Introduction to Access methods--VSAM
• Introduction Data Base Systems- Architecture
• Introduction to Hierarchical databases--IMS
• Introduction to Network Database –IDMS
• Introduction to Relational Databases
• Introduction to OLTP
• New trends in DB technology
• Basics of Data Warehousing
4
Objectives
• Evolution of Database Technology
• Limitations of Legacy Access mechanism
• Limitations of Hierarchy and Network Databases
• Emergence of Relational database management system
• Latest trends in Database technology– OODBMS– OORDBMS– XML Integration
5
What this course does not cover
• Data Modelling
• OO Design
• SQL Syntax
• Commercial database like – DB2 – Oracle – SQL Server 2000 and – Sybase
• Design of Data warehousing or Data Mining system
6
File Based Data Management• Flat file systems are first attempt of computerization of manual book
keeping system
• Retrieval of data was possible only by sequential reading Updating and deleting the existing record was almost impossible
• The only way to delete Sequential file records is to create a new file which does not contain them.
• The only way to update records in a Sequential File is to create a new file which contains the updated records
7
Disadvantages of File based system
• Data Redundancy - the same data might be stored in different places
• Poor Data Control - redundant data might be slightly different example Hanu’s data may be stored in Telephone, Payana and PSWeb
• Inability to Easily Manipulate Data - it was a tedious and error prone activity to modify files by hand
• Cryptic Work Flows - accessing the data could take excessive programming effort and was too difficult for real-users
8
VSAM Based Systems
• Designed and Developed by IBM in early 60s
• First time introduced concept of Unique Key
• To be able to locate a given record, based on its key, and fetch it with minimal I/O (ideally a single read)
• Can define Secondary keys
• Made up of Multiple Control Areas (CA)
• Each control area is made up of Control Intervals and Free space
9
Disadvantages of VSAM
• Complex Structure and access mechanism
• No querying facility
• Security Issues
• No concept of Referencing keys
• Application dependent
• Redundancy of the data
10
Hierarchical Database - IMS
• Designed in Mid of 60s again by IBM
• Based on two tier client server architecture
• Looks data only as ‘Hierarchical’
• Accessing child element only through parent node
VENDOR
ITEM
LOCATION
VENDOR1
ITEM1
LOC3LOC
2LOC1
ITEM2
LOC3LOC
1
ITEM3
11
Disadvantages of IMS
• Accessing child nodes only through parent node
• Child record can not be inserted without a parent
• One child record can have only one parent record
• No querying facility
• No referential and constraints concept
• Redundancy of the data
• Very cryptic Macros
12
Network Database - IDMS
• Designed by Conference on Data Systems Languages (CODASYL) in late 60s
• Introduced to overcome Hierarchical DB limitations
• Data elements are linked through only pointers
• Eliminated redundancy completely
• Super set of Hierarchical Database with child and multi parent relationship
13
Disadvantages of IDMS
• Difficult to access the system using cumbersome pointer concept
• It was useful for Programmers than real users
• Difficult to represent many to many relationship
14
Relational Data Base Management System (RDBMS)
• The Relational Model developed out of the work done by Dr. E. F. Codd at IBM in the early 70s who was looking for ways to solve the problems with the existing models
• First time introduced in his famous paper “A Relational Model of Data for Large Shared Databanks “
• Based on Mathematical model of Relational Algebra
• At the core of the relational model is the concept of a table (also called a relation) in which all data is stored
• Each table is made up of records (horizontal rows also known as tuples) and fields (vertical columns also known as attributes)
• Data is separated from application – No more application dependent/centric
• In the relational model, operations that manipulate data do so on the basis of the data values themselves.
• Extremely easy meta data management
• Query Language Interface
15
Application Programs using DBMS ServicesApplication Programs using DBMS Services
ApplicationPrograms
DBMS
File System
Storage
16
SNO SNAME STATUS CITY
S1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
Example of RDBMS Table
17
ER modeling
• ER modeling : A graphical technique for understanding and organizing the data independent of the actual database implementation
• Entity: Any thing that may have an independent existence and about which we intend to collect data.
Also known as Entity type.
• Entity instance: a particular member of the entity type e.g. a particular student
• Attributes: Properties/characteristics that describe entities
• Relationships: Associations between entities
18
Steps in ER Modeling
• Identify the Entities
• Find relationships
• Identify the key attributes for every Entity
• Identify other relevant attributes
• Draw complete E-R diagram with all attributes including Primary Key
• Review your results with your Business users
19
Assumptions :
• A college contains many departments
• Each department can offer any number of courses
• Many instructors can work in a department
• An instructor can work only in one department
• For each department there is a Head
• An instructor can be head of only one department
• Each instructor can take any number of courses
• A course can be taken by only one instructor
• A student can enroll for any number of courses
• Each course can have any number of students
Case Study – ER Model For a college DB
20
Step 1 : Identify the Entities :
• DEPARTMENT• STUDENT• COURSE• INSTRUCTOR
Step 2 : Find the relationships
• One course is enrolled by multiple students and one student enrolls for multiple courses, hence the cardinality between course and student is Many to Many.
• The department offers many courses and each course belongs to only one department, hence the cardinality between department and course is One to Many.
• One department has multiple instructors and one instructor belongs to one and only one department , hence the cardinality between department and instructor is one to Many.
• Each department there is a “Head of department” and one instructor is “Head of department “,hence the cardinality is one to one .
• One course is taught by only one instructor, but the instructor teaches many courses, hence the cardinality between course and instructor is many to one.
21
Step 3: Identify the key attributes
• Deptname is the key attribute for the Entity “Department”, as it identifies the
Department uniquely.• Course# (CourseId) is the key attribute for “Course” Entity. • Student# (Student Number) is the key attribute for “Student” Entity.• Instructor Name is the key attribute for “Instructor” Entity.
Step 4: Identify other relevant attributes
For the department entity, the relevant attribute is location
For course entity, course name,duration,prerequisite
For instructor entity, room#, telephone#
For student entity, student name, date of birth
22
Department
Student
InstructorCourse
Student Name
InstructorName
Student#
Room#CourseName
Course#
LocationDepartment
Name
Offers
Is taughtby
Enrolledby
HasHeaded
by
1 1 1
N 1 N
N 1
N
M
Telephone#
Date of Birth
Duration
Pre Requisite
Step 5: Draw complete E-R diagram with all attributes including Primary Key
23
What is Normalization?• Database designed based on the E-R model may have some amount of
– Inconsistency– Uncertainty– Redundancy
To eliminate these draw backs some refinement has to be done on the database.
– Refinement process is called Normalization– Defined as a step-by-step process of decomposing a complex relation into
a simple and stable data structure.– The formal process that can be followed to achieve a good database
design– Also used to check that an existing design is of good quality – The different stages of normalization are known as “normal forms”– To accomplish normalization we need to understand the concept of
Functional Dependencies.
24
Need for NormalizationStudent_Course_Result Table
Student_Details Course_Details Result_Details
101 Davis 11/4/1986 M4 Applied Mathematics Basic Mathematics 7 11/11/2004 82 A
102 Daniel 11/6/1987 M4 Applied Mathematics Basic Mathematics 7 11/11/2004 62 C
101 Davis 11/4/1986 H6 American History 4 11/22/2004 79 B
103 Sandra 10/2/1988 C3 Bio Chemistry Basic Chemistry 11 11/16/2004 65 B
104 Evelyn 2/22/1986 B3 Botany 8 11/26/2004 77 B
102 Daniel 11/6/1987 P3 Nuclear Physics Basic Physics 13 11/12/2004 68 B
105 Susan 8/31/1985 P3 Nuclear Physics Basic Physics 13 11/12/2004 89 A
103 Sandra 10/2/1988 B4 Zoology 5 11/27/2004 54 D
105 Susan 8/31/1985 H6 American History 4 11/22/2004 87 A
104 Evelyn 2/22/1986 M4 Applied Mathematics Basic Mathematics 7 11/11/2004 65 B
Insert Anomaly
Delete Anomaly
Update Anomaly
Data Duplication
25
Functional dependency
• In a given relation R, X and Y are attributes. Attribute Y is functionally dependent on attribute X if each value of X determines EXACTLY ONE value of Y, which is represented as X -> Y (X can be composite in nature).
• We say here “x determines y” or “y is functionally dependent on x” XY does not imply YX
• If the value of an attribute “Marks” is known then the value of an attribute “Grade” is determined since MarksGrade
• Types of functional dependencies:
– Full Functional dependency– Partial Functional dependency– Transitive dependency
26
Functional Dependencies
Consider the following Relation
REPORT (STUDENT#,COURSE#, CourseName, IName, Room#, Marks, Grade)
• STUDENT# - Student Number
• COURSE# - Course Number
• CourseName - Course Name
• IName - Name of the Instructor who delivered the course
• Room# - Room number which is assigned to respective Instructor
• Marks - Scored in Course COURSE# by Student STUDENT#
• Grade - obtained by Student STUDENT# in Course COURSE#
27
Functional Dependencies- From the previous example
• STUDENT# COURSE# Marks
• COURSE# CourseName,
• COURSE# IName (Assuming one course is taught by one and only one Instructor)
• IName Room# (Assuming each Instructor has his/her own and non-shared room)
• Marks Grade
28
Full dependencies
X and Y are attributes. X Functionally determines YNote: Subset of X should not functionally determine Y
Student#
Marks
Course#
29
Partial dependencies
X and Y are attributes. Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of attribute X.
Student#
Course# Room#
IName
CourseName
30
Transitive dependencies
X Y and Z are three attributes. X -> YY-> Z => X -> Z
INameCourse# Room#
31
First normal form: 1NF
• A relation schema is in 1NF :
– if and only if all the attributes of the relation R are atomic in nature.
– Atomic: the smallest level to which data may be broken down and remain meaningful
32
Student_Course_Result Table
Student_Details Course_Details Results
101 Davis 11/4/1986 M4Applied
MathematicsBasic
Mathematics 7 11/11/2004 82 A
102 Daniel 11/6/1987 M4Applied
MathematicsBasic
Mathematics 7 11/11/2004 62 C
101 Davis 11/4/1986 H6 American History 4 11/22/2004 79 B
103 Sandra 10/2/1988 C3 Bio Chemistry Basic Chemistry 11 11/16/2004 65 B
104 Evelyn 2/22/1986 B3 Botany 8 11/26/2004 77 B
102 Daniel 11/6/1987 P3 Nuclear Physics Basic Physics 13 11/12/2004 68 B
105 Susan 8/31/1985 P3 Nuclear Physics Basic Physics 13 11/12/2004 89 A
103 Sandra 10/2/1988 B4 Zoology 5 11/27/2004 54 D
105 Susan 8/31/1985 H6 American History 4 11/22/2004 87 A
104 Evelyn 2/22/1986 M4Applied
MathematicsBasic
Mathematics 7 11/11/2004 65 B
33
Student_Course_Result TableStudent# Student
NameDateofBirth
Course#
CourseName PreRequisite
Duration
InDays
DateOfExam
Marks Grade
101 Davis 04-Nov-1986 M4Applied
Mathematics Basic Mathematics 7 11-Nov-2004 82 A
102 Daniel 06-Nov-1986 M4Applied
Mathematics Basic Mathematics 7 11-Nov-2004 62 C
101 Davis 04-Nov-1986 H6 American History 4 22-Nov-2004 79 B
103 Sandra 02-Oct-1988 C3 Bio Chemistry Basic Chemistry 11 16-Nov-2004 65 B
104 Evelyn 22-Feb-1986 B3 Botany 8 26-Nov-2004 77 B
102 Daniel 06-Nov-1986 P3 Nuclear Physics Basic Physics 13 12-Nov-2004 68 B
105 Susan 31-Aug-1985 P3 Nuclear Physics Basic Physics 13 12-Nov-2004 89 A
103 Sandra 02-Oct-1988 B4 Zoology 5 27-Nov-2004 54 D
105 Susan 31-Aug-1985 H6 American History 4 22-Nov-2004 87 A
104 Evelyn 22-Feb-1986 M4Applied
Mathematics Basic Mathematics 7 11-Nov-2004 65 B
Table in 1NF
34
Second normal form: 2NF
• A Relation is said to be in Second Normal Form if and only if :– It is in the First normal form, and – No partial dependency exists between non-key attributes and key
attributes.
An attribute of a relation R that belongs to any key of R is said to be a prime attribute and that which doesn’t is a non-prime attribute
35
Second Normal Form
• STUDENT# is key attribute for Student,
• COURSE# is key attribute for Course
• STUDENT# COURSE# together form the composite key attributes for Results relationship.
• Other attributes like StudentName (Student Name), DateofBirth, CourseName, PreRequisite, DurationInDays, DateofExam, Marks and Grade are non-key attributes.
To make this table 2NF compliant, we have to remove all the partial
dependencies.
Student #, Course# -> Marks, Grade
Student# -> StudentName, DOB,
Course# -> CourseName, Prerequiste, DurationInDays
Course# -> Date of Exam
36
Second Normal Form - Tables in 2 NF
STUDENT TABLE
Student# StudentName DateofBirth
101 Davis 04-Nov-1986
102 Daniel 06-Nov-1987
103 Sandra 02-Oct-1988
104 Evelyn 22-Feb-1986
105 Susan 31-Aug-1985
106 Mike 04-Feb-1987
107 Juliet 09-Nov-1986
108 Tom 07-Oct-1986
109 Catherine 06-Jun-1984
COURSE TABLECourse# Course
NamePre
RequisiteDurationInDays
M1Basic
Mathematics 11
M4Applied
Mathematics M1 7
H6 American History 4
C1 Basic Chemistry 5
C3 Bio Chemistry C1 11
B3 Botany 8
P1 Basic Physics 8
P3 Nuclear Physics P1 13
B4 Zoology 5
37
Second Normal form – Tables in 2 NF
Student# Course# Marks Grade
101 M4 82 A
102 M4 62 C
101 H6 79 B
103 C3 65 B
104 B3 77 B
102 P3 68 B
105 P3 89 A
103 B4 54 D
105 H6 87 A
104 M4 65 B
38
Second Normal form – Tables in 2 NF
Exam_Date Table Course# DateOfExam
M4 11-Nov-04
H6 22-Nov-04
C3 16-Nov-04
B3 26-Nov-04
P3 12-Nov-04
B4 27-Nov-04
39
Third normal form:3 NF
A relation R is said to be in the Third Normal Form (3NF) if and only if
- It is in 2NF and
- No transitive dependency exists between non-key attributes and key attributes.
• STUDENT# and COURSE# are the key attributes.
• All other attributes, except grade are non-partially, non-transitively
dependent on key attributes.
• Student#, Course# - > Marks
• Marks -> Grade
40
3NF Tables
Student# Course# Marks
101 M4 82
102 M4 62
101 H6 79
103 C3 65
104 B3 77
102 P3 68
105 P3 89
103 B4 54
105 H6 87
104 M4 65
41
Third Normal Form – Tables in 3rd NF
MARKSGRADE TABLE
UpperBound LowerBound Grade
100 95 A+
94 85 A
84 70 B
69 65 B-
64 55 C
54 45 D
44 0 E
42
Boyce-Codd normal form - BCNF
A relation is said to be in Boyce Codd Normal Form (BCNF) - if and only if all the determinants are candidate keys.
BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.
43
Consider this Result Table
Student# EmailID Course# Marks
101 [email protected] M4 82
102 [email protected] M4 62
101 [email protected] H6 79
103 [email protected] C3 65
104 [email protected] B3 77
102 [email protected] P3 68
105 [email protected] P3 89
103 [email protected] B4 54
105 [email protected] H6 87
104 [email protected] M4 65
44
BCNF S#
EmailID
C#
Candidate Keys for the relation are
- STUDENT# COURSE# and COURSE# EmailID
Since Course # is overlapping, it is referred as Overlapping Candidate Key.
Valid Functional Dependendencies are
Student# - > EmailID ( Non Key Determinant)
EmailID - > Student# ( Non Key Determinant)
Student#, Course# - > Marks
Course# , EmailID - > Student#
45
BCNF STUDENT TABLE
Student# EmailID
46
BCNF Tables
Student# Course# Marks
101 M4 82
102 M4 62
101 H6 79
103 C3 65
104 B3 77
102 P3 68
105 P3 89
103 B4 54
105 H6 87
104 M4 65
47
Merits of Normalization
• Normalization is based on a mathematical foundation.
• Removes the redundancy to a greater extent. After 3NF, data redundancy is
minimized to the extent of foreign keys.
• Removes the anomalies present in INSERTs, UPDATEs and DELETEs.
48
Demerits of Normalization
• Data retrieval or SELECT operation performance will be severely affected.
• Normalization might not always represent real world scenarios.
49
SQL - BackgroundSQL - Background
• Conceived in mid-1970’s as a database language for the relational model
• Developed by IBM • First standardized in 1986 by ANSI• Enhanced in 1989• Revised again in ‘92• Non Procedural language• Number of commercial products
50
SQL Statements
• Data Definition Language (DDL)– CREATE TABLE– ALTER TABLE– DROP TABLE
• Data Manipulation Language (DML)– SELECT– INSERT– UPDATE– DELETE
• Data Control Language (DCL)– GRANT– REVOKE
51
SQL - Some ANSI/ISO KeywordsSQL - Some ANSI/ISO KeywordsALLANDAVGBETWEENCHARCOMMITCOUNTCREATECREATEDECIMALDELETEDELETE
DISTINCTDROPDROPFETCHGRANTGRANTGROUP BYHAVINGININSERTINSERTMAXMINNOT
NULLORPRIVILEGEREFERENCESREVOKEREVOKESELECTSELECTSUMTABLEUPDATEUPDATEVIEWWHERE
52
Commercial Products
• In 2004, the RDBMS market grew 10%, rising from just under $7.1 billion to nearly $7.8 billion in new license sales
• DB2 UDB – Market Leader• Oracle – Fastest growing Database
on Unix Boxes• SQL Server 2000 – Leader in Windows
Platform• Teradata – Most efficient, Self tuning,
Costliest DB for Data Warehousing application
• Sybase –May be target for acquisition • MySQL – Open Source Database
Company2004 MarketShare%
2003-04Growth%
IBM 34.1 5.8
Oracle 33.7 14.6
Microsoft 20 18
NCR Teradata 2.9 17.2
Sybase 2.3 0.5
53
On Line Transaction Processing (OLTP) System
Handle
• Several concurrent transactions from– Spatially Distributed M/cs– Execution of Instructions and Queries across LAN/WAN– Geographically distributed processors– Spatially Distributed Databases
• Transaction is defined as logical unit of program execution that takes a system from one consistent state to another consistent state
• OLTP system should adhere to ACID Properties - Atomicity
- Consistency
- Isolation
- Durability
54
active
partiallycompleted
failed
abortedcommitted
State diagram of a transaction
While executing
After th
e final
statement h
as
been
execu
ted
When normal execution can’t
proceed
After rollback and
restoration to prev state
After successful completion
55
Concurrency Vs Consistency in OLTP
• Concurrency and Consistency are inversely proportional to each other
• Multiple transactions accessing same resource simultaneously
• Problems associated with OLTP applications are– Lost update – Dirty read– Incorrect Summary– Phantom records
56
SerializationTechniques
• Locking
• Time stamping
• Ensures consistency of the database while allowing concurrent access of the resources
57
Locking
• A lock is a variable associated with each data item in a database.
• When updated by a transaction, DBMS locks the data item
• serializability could be maintained by this.
• Lock could be Shared or Exclusive
• Deadlock is most common problem with locking mechanism
58
Timestamping
• Occurs when an older transaction tries to read a value that is written by a younger transaction.
• Or when an older transaction tries to modify(Write) a value already read or written by a younger transaction
• Both of these attempts signify that the older transaction was “too late” in performing the required operation
• Commercially not a viable option because of too much rollback
59
New Trends in DB Technology
• OODBMS
• OORDBMS
• XML Integration
60
OODBMS
• Any user-defined data structures
• Any user-defined operations
• Any user-defined relationships
• Useful for– Manufacturing– Telecommunication– CAD/CAM– Multimedia products– Aerospace and Flight simulations
61
Relationship in OODBMS
• One - Many
• Many - Many
• Is A
• Extends
• Whole-part
62
Commercial Packages
• Objectivity
• Poet
• Jasmine
• Gemstone
• Itasca
• ObjectStore
More details log on to http://www.geocities.com/SiliconValley/2139/products.html
63
Limitations of OODBMS
• procedural navigation
• No querying as it breaks encapsulation
• No mathematical foundation
• Not suitable for adhoc reporting system
64
OORDBMS
• Marrying Relational and Object Oriented concepts
• Still data is stored in Relational manner
• Object wrapper for application
• Performance is the major concern
• Still under development stage
• Commercial Products– Informix Universal Server (Illustra) ( Merged with IBM ) – Oracle Oracle 10g– IBM DB2 UDB– UniSQL UniSQL/X – Unisys OSMOS
65
XML in DB
• Data-centric to Document-centric
• Simpler integration between Database and other tools like– Middlewares– EAI tools– ERP tools– Other Databases
• Introduction of Native XML data type
• XML Query Language
66
What is OLAP or DW or BI?
• An organization’s success also depends on its ability to analyze data (through views and reports) and make intelligent decisions that potentially affect its future. Systems that facilitate such analyses are called On Line Analytical Processing (OLAP) systems or Data Warehousing System
• Why not OLTP?
– OLTP databases do not contain historical data
– OLTP databases contain small subsets of organizational data
– OLTP databases are heterogeneous in nature and geographically distributed systems
• OLTP systems are– Fragmented
– Not integrated.
– Difficult to access.
– Disparate sources.
– Disparate platforms.
– Poor data quality.
– Redundant data.
– Difficult to understand
67
Data warehouse / Business Intelligence
• A Data Warehouse is a copy of the enterprise operational data, suitably modified to support the needs of analytical processes and stored outside the operational database.
• According to Bill Inmon, known as the father of Data Warehousing, a data warehouse is a
– Subject oriented,
– Integrated,
– Time-variant,
– Nonvolatile
– Collection of data in support of management decisions.
68
Data warehouse architecture
Data Warehouse Server(Tier 1)
OLAP Servers(Tier 2)
Clients(Tier 3)
OperationalDB’s
SemistructuredSources
extracttransformloadrefreshetc.
Data Marts
DataWarehouse
e.g., MOLAP
e.g., ROLAP
serve
Analysis
Query/Reporting
Data Mining
serve
serve
69
Components of DW
• Extraction Transformation and Loading (ETL)– Informatica Power Center– Data Stage– AbInitio– WebFOCUS
• Data Warehouse – Teradata– DB2 UDB– Oracle 10gOLAP– Business Object– COGNOS– Hyperion– Power Analyzer
• Data Mining– Intelligent Miner– Darwin– SAS Miner
70
Complementing Technology
• How many Infy shares sold yesterday in NASDAQ? What was the highest and lowest Price?
– OLTP System
• How Infy shares are doing in NASDAQ with respect to NSE India in last 5 Years? What’s the volume? P/E Ratio? Highest and Lowest Price?
– DW System
• What will be the Infy earnings in second quarter of next year? What will be the share price during that period?
– Data Mining System
71
References
• E&R VSAM Presentation
• E&R IMS Presentation
• E&R IDMS Presentation
• E&R RDBMS Presentation
• E&R OODBMS Presentation
• E&R OORDBMS Presentations
• E&R DW and BI Presentations
• http://mngktrmerweb/techportals/db/hanu_bank.htm
• www.oracle.com
• www.ibm.com
• www.mssqlserver.com
• www.sybase.com
72
Thank You