DWH Spring2012 LectureSlides Week1&2

26
Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS (Lecture Slides Week # 1)

Transcript of DWH Spring2012 LectureSlides Week1&2

Page 1: DWH Spring2012 LectureSlides Week1&2

Dr. Abdul Basit Siddiqui Assistant Professor

FUIEMS (Lecture Slides Week # 1)

Page 2: DWH Spring2012 LectureSlides Week1&2

Approach of the Course Develop an understanding of the underlying RDBMS

concepts.

Apply these concepts to VLDB / DSS environments and understand where and why they break down?

Expose the differences between RDBMS and Data Warehouse in the context of VLDB.

Provide the basics of DSS tools such as OLAP, Data Mining and demonstrate their applications.

Demonstrate the application of DSS concepts and limitations of the OLTP concepts through lab exercises.

FUIEMS Data Warehoue & Mining - Spring 2012 2

Page 3: DWH Spring2012 LectureSlides Week1&2

Summary of the Course Introduction & Background De-Normalization Online Analytical Processing (OLAP) Dimensional Modeling Extract-Transform-Load (ETL) Data Quality Management (DQM) Parallelism, Join and Indexing Techniques Data Mining Concepts Data Cleansing Association Rule Mining Clustering Classification

FUIEMS Data Warehoue & Mining - Spring 2012 3

Page 4: DWH Spring2012 LectureSlides Week1&2

Books Reference Books

W. H. Inmon, Building the Data Warehouse, John Wiley & Sons Inc., NY

R. Kimball, The Data Warehouse Toolkit, John Wiley & Sons Inc., NY

Paulraj Ponniah, Data Warehousing Fundamentals, John Wiley & Sons Inc., NY

FUIEMS Data Warehoue & Mining - Spring 2012 4

Page 5: DWH Spring2012 LectureSlides Week1&2
Page 6: DWH Spring2012 LectureSlides Week1&2

Why this Course? The World is changing / (in fact changed) Either change or Be left behind.

Missing the opportunities or going in the wrong direction has prevented us from growing.

What is the right direction? harnessing the data, in the knowledge driven economy.

Doing what can’t be or difficult to automate.

FUIEMS Data Warehoue & Mining - Spring 2012 6

Page 7: DWH Spring2012 LectureSlides Week1&2

Historical Overview

1960: Master Files and Reports

1965: Lots of Master Files

1970: Direct Memory Access and DBMS

1975: Online High Performance Transaction Processing

1980: PCs and 4GL Technology (MIS/DSS)

1985: Extract Programs, Extract Processing

1990: The Legacy System’s Web

FUIEMS Data Warehoue & Mining - Spring 2012 7

Page 8: DWH Spring2012 LectureSlides Week1&2

The Need of the Time

drowning in data AND/BUT starving for information.

Knowledge is power BUT Intelligence is absolute/super power.

FUIEMS Data Warehoue & Mining - Spring 2012 8

Page 9: DWH Spring2012 LectureSlides Week1&2

The Need of the Time

FUIEMS Data Warehoue & Mining - Spring 2012 9

Data

Information

Knowledge

Intelligence

POWER

($/£)

Page 10: DWH Spring2012 LectureSlides Week1&2

Scenario 1

FUIEMS Data Warehoue & Mining - Spring 2012 10

ABC Pvt Ltd is a company with branches at

Karachi, Quetta, Peshawar and Lahore. The Sales

Manager wants quarterly sales report. Each

branch has a separate operational system.

Page 11: DWH Spring2012 LectureSlides Week1&2

Scenario 1 : ABC Pvt Ltd.

FUIEMS Data Warehoue & Mining - Spring 2012 11

Karachi

Quetta

Peshawar

Lahore

Sales

Manager Sales per item type per branch

for first quarter.

Page 12: DWH Spring2012 LectureSlides Week1&2

Solution 1:ABC Pvt Ltd.

Extract sales information from each database.

Store the information in a common repository at a single site.

FUIEMS Data Warehoue & Mining - Spring 2012 12

Page 13: DWH Spring2012 LectureSlides Week1&2

Solution 1:ABC Pvt Ltd.

FUIEMS Data Warehoue & Mining - Spring 2012 13

Karachi

Quetta

Peshawar

Lahore

Data

Warehouse

Sales

Manager

Query &

Analysis tools

Report

Page 14: DWH Spring2012 LectureSlides Week1&2

Scenario 2

FUIEMS Data Warehoue & Mining - Spring 2012 14

One Stop Shopping Super Market has huge

operational database. Whenever Executives wants

some report, the OLTP system becomes slow and

data entry operators have to wait for some time.

Page 15: DWH Spring2012 LectureSlides Week1&2

Scenario 2 : One Stop Shopping

FUIEMS Data Warehoue & Mining - Spring 2012 15

Operational

Database

Data Entry Operator

Data Entry Operator

Management Wait

Report

Page 16: DWH Spring2012 LectureSlides Week1&2

Solution 2

Extract data needed for analysis from operational database.

Store it in warehouse.

Refresh warehouse at regular interval so that it contains up to date information for analysis.

Warehouse will contain data with historical perspective.

FUIEMS Data Warehoue & Mining - Spring 2012 16

Page 17: DWH Spring2012 LectureSlides Week1&2

Solution 2

FUIEMS Data Warehoue & Mining - Spring 2012 17

Operational

database

Data

Warehouse

Extract

data

Data Entry

Operator

Data Entry

Operator

Manager

Report

Transaction

Page 18: DWH Spring2012 LectureSlides Week1&2

Scenario 3

FUIEMS Data Warehoue & Mining - Spring 2012 18

Cakes & Cookies is a small, new company. President

of the company wants his company should grow. He

needs information so that he can make correct

decisions.

Page 19: DWH Spring2012 LectureSlides Week1&2

Solution 3 Improve the quality of data before loading it into the

warehouse.

Perform data cleaning and transformation before loading the data.

Use query analysis tools to support adhoc queries.

FUIEMS Data Warehoue & Mining - Spring 2012 19

Page 20: DWH Spring2012 LectureSlides Week1&2

Solution 3

FUIEMS Data Warehoue & Mining - Spring 2012 20

Query and Analysis

tool President

Expansion

Improvement

sales

time

Data

Warehouse

Page 21: DWH Spring2012 LectureSlides Week1&2

Case Study

AFCO Foods & Beverages is a new company which produces dairy, bread and meat products with production unit located at Gujranwala.

There products are sold in all the region of Pakistan.

They have sales units at provincial Head Quarters.

The President of the company wants sales information.

FUIEMS Data Warehoue & Mining - Spring 2012 21

Page 22: DWH Spring2012 LectureSlides Week1&2

Sales Information

January February March April

14 41 33 25

FUIEMS Data Warehoue & Mining - Spring 2012 22

Report: The number of units sold.

113

Report: The number of units sold over time

Page 23: DWH Spring2012 LectureSlides Week1&2

Sales Information

Jan Feb Mar Apr

Wheat Bread 6 17

Cheese 6 16 6 8

Swiss Rolls 8 25 21

FUIEMS Data Warehoue & Mining - Spring 2012 23

Report : The number of items sold for each product with

time

Product

Page 24: DWH Spring2012 LectureSlides Week1&2

Sales Information

Jan Feb Mar Apr

Karachi Wheat

Bread

3 10

Cheese 3 16 6

Swiss Rolls 4 16 6

Lahore Wheat

Bread

3 7

Cheese 3 8

Swiss Rolls 4 9 15

FUIEMS Data Warehoue & Mining - Spring 2012 24

Report: The number of items sold in each City for each

product with time

Product

Tim

e

Page 25: DWH Spring2012 LectureSlides Week1&2

Sales Information

FUIEMS Data Warehoue & Mining - Spring 2012 25

Report: The number of items sold and income in each region for

each product with time.

Jan Feb Mar Apr

Rs U Rs U Rs U Rs U

Karachi Wheat Bread 7.44 3 24.80 10

Cheese 7.95 3 42.40 16 15.90 6

Swiss Rolls 7.32 4 29.98 16 10.98 6

Lahore Wheat Bread 7.44 3 17.36 7

Cheese 7.95 3 21.20 8

Swiss Rolls 7.32 4 16.47 9 27.45 15

Page 26: DWH Spring2012 LectureSlides Week1&2

Data Warehousing includes

Build Data Warehouse

Online Analysis/Analytical Processing (OLAP).

Presentation.

FUIEMS Data Warehoue & Mining - Spring 2012 26

RDBMS

Flat File

Presentation

Cleaning ,Selection &

Integration

Warehouse & OLAP server Client