Overview of RDBMS Technology

72
Overview of Relational Database Technology Hanu [email protected]

Transcript of Overview of RDBMS Technology

Page 1: Overview of RDBMS Technology

Overview of Relational Database TechnologyHanu

[email protected]

Page 2: Overview of RDBMS Technology

2

Introductions

Page 3: Overview of RDBMS Technology

3

Agenda

• Introduction to files systems

• Introduction to Access methods--VSAM

• Introduction Data Base Systems- Architecture

• Introduction to Hierarchical databases--IMS

• Introduction to Network Database –IDMS

• Introduction to Relational Databases

• Introduction to OLTP

• New trends in DB technology

• Basics of Data Warehousing

Page 4: Overview of RDBMS Technology

4

Objectives

• Evolution of Database Technology

• Limitations of Legacy Access mechanism

• Limitations of Hierarchy and Network Databases

• Emergence of Relational database management system

• Latest trends in Database technology– OODBMS– OORDBMS– XML Integration

Page 5: Overview of RDBMS Technology

5

What this course does not cover

• Data Modelling

• OO Design

• SQL Syntax

• Commercial database like – DB2 – Oracle – SQL Server 2000 and – Sybase

• Design of Data warehousing or Data Mining system

Page 6: Overview of RDBMS Technology

6

File Based Data Management• Flat file systems are first attempt of computerization of manual book

keeping system

• Retrieval of data was possible only by sequential reading Updating and deleting the existing record was almost impossible

• The only way to delete Sequential file records is to create a new file which does not contain them.

• The only way to update records in a Sequential File is to create a new file which contains the updated records

Page 7: Overview of RDBMS Technology

7

Disadvantages of File based system

• Data Redundancy - the same data might be stored in different places

• Poor Data Control - redundant data might be slightly different example Hanu’s data may be stored in Telephone, Payana and PSWeb

• Inability to Easily Manipulate Data - it was a tedious and error prone activity to modify files by hand

• Cryptic Work Flows - accessing the data could take excessive programming effort and was too difficult for real-users

Page 8: Overview of RDBMS Technology

8

VSAM Based Systems

• Designed and Developed by IBM in early 60s

• First time introduced concept of Unique Key

• To be able to locate a given record, based on its key, and fetch it with minimal I/O (ideally a single read)

• Can define Secondary keys

• Made up of Multiple Control Areas (CA)

• Each control area is made up of Control Intervals and Free space

Page 9: Overview of RDBMS Technology

9

Disadvantages of VSAM

• Complex Structure and access mechanism

• No querying facility

• Security Issues

• No concept of Referencing keys

• Application dependent

• Redundancy of the data

Page 10: Overview of RDBMS Technology

10

Hierarchical Database - IMS

• Designed in Mid of 60s again by IBM

• Based on two tier client server architecture

• Looks data only as ‘Hierarchical’

• Accessing child element only through parent node

VENDOR

ITEM

LOCATION

VENDOR1

ITEM1

LOC3LOC

2LOC1

ITEM2

LOC3LOC

1

ITEM3

Page 11: Overview of RDBMS Technology

11

Disadvantages of IMS

• Accessing child nodes only through parent node

• Child record can not be inserted without a parent

• One child record can have only one parent record

• No querying facility

• No referential and constraints concept

• Redundancy of the data

• Very cryptic Macros

Page 12: Overview of RDBMS Technology

12

Network Database - IDMS

• Designed by Conference on Data Systems Languages (CODASYL) in late 60s

• Introduced to overcome Hierarchical DB limitations

• Data elements are linked through only pointers

• Eliminated redundancy completely

• Super set of Hierarchical Database with child and multi parent relationship

Page 13: Overview of RDBMS Technology

13

Disadvantages of IDMS

• Difficult to access the system using cumbersome pointer concept

• It was useful for Programmers than real users

• Difficult to represent many to many relationship

Page 14: Overview of RDBMS Technology

14

Relational Data Base Management System (RDBMS)

• The Relational Model developed out of the work done by Dr. E. F. Codd at IBM in the early 70s who was looking for ways to solve the problems with the existing models

• First time introduced in his famous paper “A Relational Model of Data for Large Shared Databanks “

• Based on Mathematical model of Relational Algebra

• At the core of the relational model is the concept of a table (also called a relation) in which all data is stored

• Each table is made up of records (horizontal rows also known as tuples) and fields (vertical columns also known as attributes)

• Data is separated from application – No more application dependent/centric

• In the relational model, operations that manipulate data do so on the basis of the data values themselves.

• Extremely easy meta data management

• Query Language Interface

Page 15: Overview of RDBMS Technology

15

Application Programs using DBMS ServicesApplication Programs using DBMS Services

ApplicationPrograms

DBMS

File System

Storage

Page 16: Overview of RDBMS Technology

16

SNO SNAME STATUS CITY

S1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

Example of RDBMS Table

Page 17: Overview of RDBMS Technology

17

ER modeling

• ER modeling : A graphical technique for understanding and organizing the data independent of the actual database implementation

• Entity: Any thing that may have an independent existence and about which we intend to collect data.

Also known as Entity type.

• Entity instance: a particular member of the entity type e.g. a particular student

• Attributes: Properties/characteristics that describe entities

• Relationships: Associations between entities

Page 18: Overview of RDBMS Technology

18

Steps in ER Modeling

• Identify the Entities

• Find relationships

• Identify the key attributes for every Entity

• Identify other relevant attributes

• Draw complete E-R diagram with all attributes including Primary Key

• Review your results with your Business users

Page 19: Overview of RDBMS Technology

19

Assumptions :

• A college contains many departments

• Each department can offer any number of courses

• Many instructors can work in a department

• An instructor can work only in one department

• For each department there is a Head

• An instructor can be head of only one department

• Each instructor can take any number of courses

• A course can be taken by only one instructor

• A student can enroll for any number of courses

• Each course can have any number of students

Case Study – ER Model For a college DB

Page 20: Overview of RDBMS Technology

20

Step 1 : Identify the Entities :

• DEPARTMENT• STUDENT• COURSE• INSTRUCTOR

Step 2 : Find the relationships

• One course is enrolled by multiple students and one student enrolls for multiple courses, hence the cardinality between course and student is Many to Many.

• The department offers many courses and each course belongs to only one department, hence the cardinality between department and course is One to Many.

• One department has multiple instructors and one instructor belongs to one and only one department , hence the cardinality between department and instructor is one to Many.

• Each department there is a “Head of department” and one instructor is “Head of department “,hence the cardinality is one to one .

• One course is taught by only one instructor, but the instructor teaches many courses, hence the cardinality between course and instructor is many to one.

Page 21: Overview of RDBMS Technology

21

Step 3: Identify the key attributes

• Deptname is the key attribute for the Entity “Department”, as it identifies the

Department uniquely.• Course# (CourseId) is the key attribute for “Course” Entity. • Student# (Student Number) is the key attribute for “Student” Entity.• Instructor Name is the key attribute for “Instructor” Entity.

Step 4: Identify other relevant attributes

For the department entity, the relevant attribute is location

For course entity, course name,duration,prerequisite

For instructor entity, room#, telephone#

For student entity, student name, date of birth

Page 22: Overview of RDBMS Technology

22

Department

Student

InstructorCourse

Student Name

InstructorName

Student#

Room#CourseName

Course#

LocationDepartment

Name

Offers

Is taughtby

Enrolledby

HasHeaded

by

1 1 1

N 1 N

N 1

N

M

Telephone#

Date of Birth

Duration

Pre Requisite

Step 5: Draw complete E-R diagram with all attributes including Primary Key

Page 23: Overview of RDBMS Technology

23

What is Normalization?• Database designed based on the E-R model may have some amount of

– Inconsistency– Uncertainty– Redundancy

To eliminate these draw backs some refinement has to be done on the database.

– Refinement process is called Normalization– Defined as a step-by-step process of decomposing a complex relation into

a simple and stable data structure.– The formal process that can be followed to achieve a good database

design– Also used to check that an existing design is of good quality – The different stages of normalization are known as “normal forms”– To accomplish normalization we need to understand the concept of

Functional Dependencies.

Page 24: Overview of RDBMS Technology

24

Need for NormalizationStudent_Course_Result Table

Student_Details Course_Details Result_Details

101 Davis 11/4/1986 M4 Applied Mathematics Basic Mathematics 7 11/11/2004 82 A

102 Daniel 11/6/1987 M4 Applied Mathematics Basic Mathematics 7 11/11/2004 62 C

101 Davis 11/4/1986 H6 American History 4 11/22/2004 79 B

103 Sandra 10/2/1988 C3 Bio Chemistry Basic Chemistry 11 11/16/2004 65 B

104 Evelyn 2/22/1986 B3 Botany 8 11/26/2004 77 B

102 Daniel 11/6/1987 P3 Nuclear Physics Basic Physics 13 11/12/2004 68 B

105 Susan 8/31/1985 P3 Nuclear Physics Basic Physics 13 11/12/2004 89 A

103 Sandra 10/2/1988 B4 Zoology 5 11/27/2004 54 D

105 Susan 8/31/1985 H6 American History 4 11/22/2004 87 A

104 Evelyn 2/22/1986 M4 Applied Mathematics Basic Mathematics 7 11/11/2004 65 B

Insert Anomaly

Delete Anomaly

Update Anomaly

Data Duplication

Page 25: Overview of RDBMS Technology

25

Functional dependency

• In a given relation R, X and Y are attributes. Attribute Y is functionally dependent on attribute X if each value of X determines EXACTLY ONE value of Y, which is represented as X -> Y (X can be composite in nature).

• We say here “x determines y” or “y is functionally dependent on x” XY does not imply YX

• If the value of an attribute “Marks” is known then the value of an attribute “Grade” is determined since MarksGrade

• Types of functional dependencies:

– Full Functional dependency– Partial Functional dependency– Transitive dependency

Page 26: Overview of RDBMS Technology

26

Functional Dependencies

Consider the following Relation

REPORT (STUDENT#,COURSE#, CourseName, IName, Room#, Marks, Grade)

• STUDENT# - Student Number

• COURSE# - Course Number

• CourseName - Course Name

• IName - Name of the Instructor who delivered the course

• Room# - Room number which is assigned to respective Instructor

• Marks - Scored in Course COURSE# by Student STUDENT#

• Grade - obtained by Student STUDENT# in Course COURSE#

Page 27: Overview of RDBMS Technology

27

Functional Dependencies- From the previous example

• STUDENT# COURSE# Marks

• COURSE# CourseName,

• COURSE# IName (Assuming one course is taught by one and only one Instructor)

• IName Room# (Assuming each Instructor has his/her own and non-shared room)

• Marks Grade

Page 28: Overview of RDBMS Technology

28

Full dependencies

X and Y are attributes. X Functionally determines YNote: Subset of X should not functionally determine Y

Student#

Marks

Course#

Page 29: Overview of RDBMS Technology

29

Partial dependencies

X and Y are attributes. Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of attribute X.

Student#

Course# Room#

IName

CourseName

Page 30: Overview of RDBMS Technology

30

Transitive dependencies

X Y and Z are three attributes. X -> YY-> Z => X -> Z

INameCourse# Room#

Page 31: Overview of RDBMS Technology

31

First normal form: 1NF

• A relation schema is in 1NF :

– if and only if all the attributes of the relation R are atomic in nature.

– Atomic: the smallest level to which data may be broken down and remain meaningful

Page 32: Overview of RDBMS Technology

32

Student_Course_Result Table

Student_Details Course_Details Results

101 Davis 11/4/1986 M4Applied

MathematicsBasic

Mathematics 7 11/11/2004 82 A

102 Daniel 11/6/1987 M4Applied

MathematicsBasic

Mathematics 7 11/11/2004 62 C

101 Davis 11/4/1986 H6 American History 4 11/22/2004 79 B

103 Sandra 10/2/1988 C3 Bio Chemistry Basic Chemistry 11 11/16/2004 65 B

104 Evelyn 2/22/1986 B3 Botany 8 11/26/2004 77 B

102 Daniel 11/6/1987 P3 Nuclear Physics Basic Physics 13 11/12/2004 68 B

105 Susan 8/31/1985 P3 Nuclear Physics Basic Physics 13 11/12/2004 89 A

103 Sandra 10/2/1988 B4 Zoology 5 11/27/2004 54 D

105 Susan 8/31/1985 H6 American History 4 11/22/2004 87 A

104 Evelyn 2/22/1986 M4Applied

MathematicsBasic

Mathematics 7 11/11/2004 65 B

Page 33: Overview of RDBMS Technology

33

Student_Course_Result TableStudent# Student

NameDateofBirth

Course#

CourseName PreRequisite

Duration

InDays

DateOfExam

Marks Grade

101 Davis 04-Nov-1986 M4Applied

Mathematics Basic Mathematics 7 11-Nov-2004 82 A

102 Daniel 06-Nov-1986 M4Applied

Mathematics Basic Mathematics 7 11-Nov-2004 62 C

101 Davis 04-Nov-1986 H6 American History   4 22-Nov-2004 79 B

103 Sandra 02-Oct-1988 C3 Bio Chemistry Basic Chemistry 11 16-Nov-2004 65 B

104 Evelyn 22-Feb-1986 B3 Botany   8 26-Nov-2004 77 B

102 Daniel 06-Nov-1986 P3 Nuclear Physics Basic Physics 13 12-Nov-2004 68 B

105 Susan 31-Aug-1985 P3 Nuclear Physics Basic Physics 13 12-Nov-2004 89 A

103 Sandra 02-Oct-1988 B4 Zoology   5 27-Nov-2004 54 D

105 Susan 31-Aug-1985 H6 American History   4 22-Nov-2004 87 A

104 Evelyn 22-Feb-1986 M4Applied

Mathematics Basic Mathematics 7 11-Nov-2004 65 B

Table in 1NF

Page 34: Overview of RDBMS Technology

34

Second normal form: 2NF

• A Relation is said to be in Second Normal Form if and only if :– It is in the First normal form, and – No partial dependency exists between non-key attributes and key

attributes.

An attribute of a relation R that belongs to any key of R is said to be a prime attribute and that which doesn’t is a non-prime attribute

Page 35: Overview of RDBMS Technology

35

Second Normal Form

• STUDENT# is key attribute for Student,

• COURSE# is key attribute for Course

• STUDENT# COURSE# together form the composite key attributes for Results relationship.

• Other attributes like StudentName (Student Name), DateofBirth, CourseName, PreRequisite, DurationInDays, DateofExam, Marks and Grade are non-key attributes.

To make this table 2NF compliant, we have to remove all the partial

dependencies.

Student #, Course# -> Marks, Grade

Student# -> StudentName, DOB,

Course# -> CourseName, Prerequiste, DurationInDays

Course# -> Date of Exam

Page 36: Overview of RDBMS Technology

36

Second Normal Form - Tables in 2 NF

STUDENT TABLE

Student# StudentName DateofBirth

101 Davis 04-Nov-1986

102 Daniel 06-Nov-1987

103 Sandra 02-Oct-1988

104 Evelyn 22-Feb-1986

105 Susan 31-Aug-1985

106 Mike 04-Feb-1987

107 Juliet 09-Nov-1986

108 Tom 07-Oct-1986

109 Catherine 06-Jun-1984

COURSE TABLECourse# Course

NamePre

RequisiteDurationInDays

M1Basic

Mathematics   11

M4Applied

Mathematics M1 7

H6 American History   4

C1 Basic Chemistry   5

C3 Bio Chemistry C1 11

B3 Botany   8

P1 Basic Physics   8

P3 Nuclear Physics P1 13

B4 Zoology   5

Page 37: Overview of RDBMS Technology

37

Second Normal form – Tables in 2 NF

Student# Course# Marks Grade

101 M4 82 A

102 M4 62 C

101 H6 79 B

103 C3 65 B

104 B3 77 B

102 P3 68 B

105 P3 89 A

103 B4 54 D

105 H6 87 A

104 M4 65 B

Page 38: Overview of RDBMS Technology

38

Second Normal form – Tables in 2 NF

Exam_Date Table Course# DateOfExam

M4 11-Nov-04

H6 22-Nov-04

C3 16-Nov-04

B3 26-Nov-04

P3 12-Nov-04

B4 27-Nov-04

Page 39: Overview of RDBMS Technology

39

Third normal form:3 NF

A relation R is said to be in the Third Normal Form (3NF) if and only if

- It is in 2NF and

- No transitive dependency exists between non-key attributes and key attributes.

• STUDENT# and COURSE# are the key attributes.

• All other attributes, except grade are non-partially, non-transitively

dependent on key attributes.

• Student#, Course# - > Marks

• Marks -> Grade

Page 40: Overview of RDBMS Technology

40

3NF Tables

Student# Course# Marks

101 M4 82

102 M4 62

101 H6 79

103 C3 65

104 B3 77

102 P3 68

105 P3 89

103 B4 54

105 H6 87

104 M4 65

Page 41: Overview of RDBMS Technology

41

Third Normal Form – Tables in 3rd NF

MARKSGRADE TABLE

UpperBound LowerBound Grade

100 95 A+

94 85 A

84 70 B

69 65 B-

64 55 C

54 45 D

44 0 E

Page 42: Overview of RDBMS Technology

42

Boyce-Codd normal form - BCNF

A relation is said to be in Boyce Codd Normal Form (BCNF) - if and only if all the determinants are candidate keys.

BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.

Page 43: Overview of RDBMS Technology

43

Consider this Result Table

Student# EmailID Course# Marks

101 [email protected] M4 82

102 [email protected] M4 62

101 [email protected] H6 79

103 [email protected] C3 65

104 [email protected] B3 77

102 [email protected] P3 68

105 [email protected] P3 89

103 [email protected] B4 54

105 [email protected] H6 87

104 [email protected] M4 65

Page 44: Overview of RDBMS Technology

44

BCNF S#

EmailID

C#

Candidate Keys for the relation are

- STUDENT# COURSE# and COURSE# EmailID

Since Course # is overlapping, it is referred as Overlapping Candidate Key.

Valid Functional Dependendencies are

Student# - > EmailID ( Non Key Determinant)

EmailID - > Student# ( Non Key Determinant)

Student#, Course# - > Marks

Course# , EmailID - > Student#

Page 45: Overview of RDBMS Technology

45

BCNF STUDENT TABLE

 

Student# EmailID

101 [email protected]

102 [email protected]

103 [email protected]

104 [email protected]

105 [email protected]

Page 46: Overview of RDBMS Technology

46

BCNF Tables

Student# Course# Marks

101 M4 82

102 M4 62

101 H6 79

103 C3 65

104 B3 77

102 P3 68

105 P3 89

103 B4 54

105 H6 87

104 M4 65

Page 47: Overview of RDBMS Technology

47

Merits of Normalization

• Normalization is based on a mathematical foundation.

• Removes the redundancy to a greater extent. After 3NF, data redundancy is

minimized to the extent of foreign keys.

• Removes the anomalies present in INSERTs, UPDATEs and DELETEs.

Page 48: Overview of RDBMS Technology

48

Demerits of Normalization

• Data retrieval or SELECT operation performance will be severely affected.

• Normalization might not always represent real world scenarios.

Page 49: Overview of RDBMS Technology

49

SQL - BackgroundSQL - Background

• Conceived in mid-1970’s as a database language for the relational model

• Developed by IBM • First standardized in 1986 by ANSI• Enhanced in 1989• Revised again in ‘92• Non Procedural language• Number of commercial products

Page 50: Overview of RDBMS Technology

50

SQL Statements

• Data Definition Language (DDL)– CREATE TABLE– ALTER TABLE– DROP TABLE

• Data Manipulation Language (DML)– SELECT– INSERT– UPDATE– DELETE

• Data Control Language (DCL)– GRANT– REVOKE

Page 51: Overview of RDBMS Technology

51

SQL - Some ANSI/ISO KeywordsSQL - Some ANSI/ISO KeywordsALLANDAVGBETWEENCHARCOMMITCOUNTCREATECREATEDECIMALDELETEDELETE

DISTINCTDROPDROPFETCHGRANTGRANTGROUP BYHAVINGININSERTINSERTMAXMINNOT

NULLORPRIVILEGEREFERENCESREVOKEREVOKESELECTSELECTSUMTABLEUPDATEUPDATEVIEWWHERE

Page 52: Overview of RDBMS Technology

52

Commercial Products

• In 2004, the RDBMS market grew 10%, rising from just under $7.1 billion to nearly $7.8 billion in new license sales

• DB2 UDB – Market Leader• Oracle – Fastest growing Database

on Unix Boxes• SQL Server 2000 – Leader in Windows

Platform• Teradata – Most efficient, Self tuning,

Costliest DB for Data Warehousing application

• Sybase –May be target for acquisition • MySQL – Open Source Database

Company2004 MarketShare%

2003-04Growth%

IBM 34.1 5.8

Oracle 33.7 14.6

Microsoft 20 18

NCR Teradata 2.9 17.2

Sybase 2.3 0.5

Page 53: Overview of RDBMS Technology

53

On Line Transaction Processing (OLTP) System

Handle

• Several concurrent transactions from– Spatially Distributed M/cs– Execution of Instructions and Queries across LAN/WAN– Geographically distributed processors– Spatially Distributed Databases

• Transaction is defined as logical unit of program execution that takes a system from one consistent state to another consistent state

• OLTP system should adhere to ACID Properties - Atomicity

- Consistency

- Isolation

- Durability

Page 54: Overview of RDBMS Technology

54

active

partiallycompleted

failed

abortedcommitted

State diagram of a transaction

While executing

After th

e final

statement h

as

been

execu

ted

When normal execution can’t

proceed

After rollback and

restoration to prev state

After successful completion

Page 55: Overview of RDBMS Technology

55

Concurrency Vs Consistency in OLTP

• Concurrency and Consistency are inversely proportional to each other

• Multiple transactions accessing same resource simultaneously

• Problems associated with OLTP applications are– Lost update – Dirty read– Incorrect Summary– Phantom records

Page 56: Overview of RDBMS Technology

56

SerializationTechniques

• Locking

• Time stamping

• Ensures consistency of the database while allowing concurrent access of the resources

Page 57: Overview of RDBMS Technology

57

Locking

• A lock is a variable associated with each data item in a database.

• When updated by a transaction, DBMS locks the data item

• serializability could be maintained by this.

• Lock could be Shared or Exclusive

• Deadlock is most common problem with locking mechanism

Page 58: Overview of RDBMS Technology

58

Timestamping

• Occurs when an older transaction tries to read a value that is written by a younger transaction.

• Or when an older transaction tries to modify(Write) a value already read or written by a younger transaction

• Both of these attempts signify that the older transaction was “too late” in performing the required operation

• Commercially not a viable option because of too much rollback

Page 59: Overview of RDBMS Technology

59

New Trends in DB Technology

• OODBMS

• OORDBMS

• XML Integration

Page 60: Overview of RDBMS Technology

60

OODBMS

• Any user-defined data structures

• Any user-defined operations

• Any user-defined relationships

• Useful for– Manufacturing– Telecommunication– CAD/CAM– Multimedia products– Aerospace and Flight simulations

Page 61: Overview of RDBMS Technology

61

Relationship in OODBMS

• One - Many

• Many - Many

• Is A

• Extends

• Whole-part

Page 62: Overview of RDBMS Technology

62

Commercial Packages

• Objectivity

• Poet

• Jasmine

• Gemstone

• Itasca

• ObjectStore

More details log on to http://www.geocities.com/SiliconValley/2139/products.html

Page 63: Overview of RDBMS Technology

63

Limitations of OODBMS

• procedural navigation

• No querying as it breaks encapsulation

• No mathematical foundation

• Not suitable for adhoc reporting system

Page 64: Overview of RDBMS Technology

64

OORDBMS

• Marrying Relational and Object Oriented concepts

• Still data is stored in Relational manner

• Object wrapper for application

• Performance is the major concern

• Still under development stage

• Commercial Products– Informix Universal Server (Illustra) ( Merged with IBM ) – Oracle Oracle 10g– IBM DB2 UDB– UniSQL UniSQL/X – Unisys OSMOS

Page 65: Overview of RDBMS Technology

65

XML in DB

• Data-centric to Document-centric

• Simpler integration between Database and other tools like– Middlewares– EAI tools– ERP tools– Other Databases

• Introduction of Native XML data type

• XML Query Language

Page 66: Overview of RDBMS Technology

66

What is OLAP or DW or BI?

• An organization’s success also depends on its ability to analyze data (through views and reports) and make intelligent decisions that potentially affect its future. Systems that facilitate such analyses are called On Line Analytical Processing (OLAP) systems or Data Warehousing System

• Why not OLTP?

– OLTP databases do not contain historical data

– OLTP databases contain small subsets of organizational data

– OLTP databases are heterogeneous in nature and geographically distributed systems

• OLTP systems are– Fragmented

– Not integrated.

– Difficult to access.

– Disparate sources.

– Disparate platforms.

– Poor data quality.

– Redundant data.

– Difficult to understand

Page 67: Overview of RDBMS Technology

67

Data warehouse / Business Intelligence

• A Data Warehouse is a copy of the enterprise operational data, suitably modified to support the needs of analytical processes and stored outside the operational database.

• According to Bill Inmon, known as the father of Data Warehousing, a data warehouse is a

– Subject oriented,

– Integrated,

– Time-variant,

– Nonvolatile

– Collection of data in support of management decisions.

Page 68: Overview of RDBMS Technology

68

Data warehouse architecture

Data Warehouse Server(Tier 1)

OLAP Servers(Tier 2)

Clients(Tier 3)

OperationalDB’s

SemistructuredSources

extracttransformloadrefreshetc.

Data Marts

DataWarehouse

e.g., MOLAP

e.g., ROLAP

serve

Analysis

Query/Reporting

Data Mining

serve

serve

Page 69: Overview of RDBMS Technology

69

Components of DW

• Extraction Transformation and Loading (ETL)– Informatica Power Center– Data Stage– AbInitio– WebFOCUS

• Data Warehouse – Teradata– DB2 UDB– Oracle 10gOLAP– Business Object– COGNOS– Hyperion– Power Analyzer

• Data Mining– Intelligent Miner– Darwin– SAS Miner

Page 70: Overview of RDBMS Technology

70

Complementing Technology

• How many Infy shares sold yesterday in NASDAQ? What was the highest and lowest Price?

– OLTP System

• How Infy shares are doing in NASDAQ with respect to NSE India in last 5 Years? What’s the volume? P/E Ratio? Highest and Lowest Price?

– DW System

• What will be the Infy earnings in second quarter of next year? What will be the share price during that period?

– Data Mining System

Page 71: Overview of RDBMS Technology

71

References

• E&R VSAM Presentation

• E&R IMS Presentation

• E&R IDMS Presentation

• E&R RDBMS Presentation

• E&R OODBMS Presentation

• E&R OORDBMS Presentations

• E&R DW and BI Presentations

• http://mngktrmerweb/techportals/db/hanu_bank.htm

• www.oracle.com

• www.ibm.com

• www.mssqlserver.com

• www.sybase.com

Page 72: Overview of RDBMS Technology

72

Thank You