DWH PPT

31
By: Ritika Sanjesh Dubey Shruti Mohta Rani Treasa

description

Data Warehousing

Transcript of DWH PPT

By:

RitikaSanjesh DubeyShruti MohtaRani Treasa

Essence of data warehouse.

Architecture of data Warehouse

Differences in OLTP & OLAP

Data modeling

ETL processes

Reporting Tools

A warehouse:

‘A place where goods are physically stocked, to facilitate smooth flow of business without any production downtime or crisis.’

In layman's words, Data Warehousing:

‘A data warehouse is read only database which copies/stores the data from the transactional database.’

Two kinds of Systems:

Operational Analysis

Need performanceNeeds flexibility and

broad scope

Business analysis should not interfere with and degrade the performance of the operational systems

Thus arises the need of Data warehousing

•Data warehouse is a concept, not a

product

•Set of Hardware and Software

components used to analyze massive

amounts of data.

•Intelligent way of managing Data.

•Data->Information->Knowledge-

>Decision

Growing industry: • $ 8 billion in 1998

Range from desktop to huge warehouses• Walmart: 900-CPU, 2,700 disks, 23TB• Teradata system

Lots of new terms• ROLAP, MOLAP, HOLAP• Rollup, Drill-down, Slice & dice

2 basic data processing models: OLTP –

Reliable and efficient processing of a large number of transactions

Ensuring data consistency. OLAP –

Efficient multidimensional processing of large data volumes.

OLTP: On Line Transaction Processing• Describes processing at operational sites• Captures the transaction information

necessary to run their business operations• Needs performance

OLAP: On Line Analytical Processing• Describes processing at warehouse• analyze transaction information at an

aggregate level to improve the decision-making process

• Flexibility and broad scope

OLTP OLAP

Clerk, IT professional Day to Day Operations Application-oriented Current, Up-to-date Detailed, Relational, Flat Isolated Repetitive Read/ Write Index/hash on prim. Key Tens

Thousands 100MB –GB Transaction Throughput

Knowledge worker Decision Support Subject-oriented Historical, Summarized Multidimensional Integrated, consolidated Ad-hoc Lots of scans Complex Query Millions

Hundreds 100 GB – TB Query Throughput/

Response

UsersFunction

DB designData

UsageAccessUnit of

Work# of

Records Accessed

# of UsersDB sizeMetric

A data warehouse is a Structured

Extensible Environment Designed for the analysis of :

• non -volatile data,

• logically and physically transformed • from multiple source applications

• to align with business structure,

• updated and maintained for a long time period,

• expressed in simple business terms,

• and summarized for quick analysis.

•Subject Oriented•Integrated•Non Volatile and Time Variant

Accounting

Order Entry

Billing

Operational data is organized by specific processes or tasks and is maintained by separate systems

OperationalSystems

Customer

Usage

Revenue

Warehoused data is organized by subject area and is populated from many operational systems

DataWarehouse

Measurement ofAttributes

Naming Conventions

Data type format

Data type format

Physical attributeof data

Encoding structures

Operational

DWH

Current Value Data:

Time Horizon: 60-90 daysKey: May or May not have Time

VOLATILE:Data can be updated

Current Value Data:

Time Horizon: 60-90 daysKey: May or May not have Time

VOLATILE:Data can be updated

Snap Shot Data:

Time Horizon: 5 – 10 yearsKey: Contains an element of Time

NON- VOLATILEOne snap shot made, data cannot be updated

Snap Shot Data:

Time Horizon: 5 – 10 yearsKey: Contains an element of Time

NON- VOLATILEOne snap shot made, data cannot be updated

Data warehousing is the process of Extracting, Integrating, Filtering, Standardizing, Transforming, Cleaning and Quality checking to the organization applications data and storing it in a consolidated database.

Extracting is chucking of the data from disparate sources.

Extracting is chucking of the data from disparate sources.Putting together the extracted in to

a consistent format.Putting together the extracted in to

a consistent format.Data filtering is the process of extracting the required data from the OLTP or external source data sources. For example the user may be interested only last five years sales data.

Data filtering is the process of extracting the required data from the OLTP or external source data sources. For example the user may be interested only last five years sales data.

As the Data will be moved to from different OLTP database or flat file system, to one target, data need to be standardized. For

example Date fields or Flag fields.

As the Data will be moved to from different OLTP database or flat file system, to one target, data need to be standardized. For

example Date fields or Flag fields.

Data is extracted from OLTP databases and external data source data. Data transformation will have to be carried out on the extracted data before data is carried to Data warehouse.

Data is extracted from OLTP databases and external data source data. Data transformation will have to be carried out on the extracted data before data is carried to Data warehouse.

To ensure the Data quality, accuracyTo ensure the Data quality, accuracy

13

Data warehouse

storage

Source systems

OLAP Server

End user views

Flow of information

Direction of analysis

1

2

Flat Files

Data from RDBMS

Staging Area

Data Warehouse

Source Systems

Data Extraction, Transformation and Loading

(ETL)

Business Intelligence

Querying and Reporting

OLAP

Data Mining

ETL(extraction, transformation, and

loading) The process of extracting data from

source systems and bringing it into the

data warehouse It includes the transportation phase and

each of the phases of the process are not

distinct.

Major ETL Tools are:

• Informatica Power Mart

• Informatica Power Center

•DP Warehouse

•Oracle Express

•Data Mirror

1. Highly normalized2. Design considerations are:

• Quick retrieval• Fast updates

3. No redundancies

1. Highly normalized2. Design considerations are:

• Quick retrieval• Fast updates

3. No redundancies

RDBMS Model MDDB Model

Star schema: A single object (fact table) in the middle connected to a number of dimension tables

DateMonthYear

Date

CustIdCustNameCustCity

CustCountry

CustIdCustNameCustCity

CustCountry

CustomerCustomer

Sales Fact Table

Date

ProductProduct

StoreStore

CustomerCustomer

Unit_salesUnit_sales

Dollar_salesDollar_sales

Schilling_salesSchilling_salesMeasurementsMeasurements

ProductNoProdNameProdDescCategory

QOH

ProductNoProdNameProdDescCategory

QOH

ProductProduct

StoreIDCity

StateCountryRegion

StoreIDCity

StateCountryRegion

StoreStore

A refinement of star schema Where the dimensional hierarchy is represented explicitly by normalizing the dimensional tables .

MonthMonth DateMonthDateMonth

Sales Fact Table

DateDate

ProductProduct

StoreStore

CustomerCustomer

Unit_salesUnit_sales

Dollar_salesDollar_sales

Schilling_salesSchilling_sales

ProductNoProdNameProdDescCategoryQOH

ProductNoProdNameProdDescCategoryQOH

ProductProduct

CustIdCustNameCustCityCustCountry

CustIdCustNameCustCityCustCountry

CustomerCustomer

DateDateMonthYearMonthYear

YearYear

YearYear

CityStateCityState

CityCity

CountryRegionCountryRegion

CountryCountry

StateCountryStateCountry

StateStateStoreIDCityStoreIDCity

StoreStore

MeasurementsMeasurements

Fact constellations: Multiple fact tables share dimension tables

To find out Chronic Aircraft

Criteria: Three reports in 15 days or 45 flights. Whatever comes first .

Thank You!