DWH PPT
description
Transcript of DWH PPT
Essence of data warehouse.
Architecture of data Warehouse
Differences in OLTP & OLAP
Data modeling
ETL processes
Reporting Tools
A warehouse:
‘A place where goods are physically stocked, to facilitate smooth flow of business without any production downtime or crisis.’
In layman's words, Data Warehousing:
‘A data warehouse is read only database which copies/stores the data from the transactional database.’
Two kinds of Systems:
Operational Analysis
Need performanceNeeds flexibility and
broad scope
Business analysis should not interfere with and degrade the performance of the operational systems
Thus arises the need of Data warehousing
•Data warehouse is a concept, not a
product
•Set of Hardware and Software
components used to analyze massive
amounts of data.
•Intelligent way of managing Data.
•Data->Information->Knowledge-
>Decision
Growing industry: • $ 8 billion in 1998
Range from desktop to huge warehouses• Walmart: 900-CPU, 2,700 disks, 23TB• Teradata system
Lots of new terms• ROLAP, MOLAP, HOLAP• Rollup, Drill-down, Slice & dice
2 basic data processing models: OLTP –
Reliable and efficient processing of a large number of transactions
Ensuring data consistency. OLAP –
Efficient multidimensional processing of large data volumes.
OLTP: On Line Transaction Processing• Describes processing at operational sites• Captures the transaction information
necessary to run their business operations• Needs performance
OLAP: On Line Analytical Processing• Describes processing at warehouse• analyze transaction information at an
aggregate level to improve the decision-making process
• Flexibility and broad scope
OLTP OLAP
Clerk, IT professional Day to Day Operations Application-oriented Current, Up-to-date Detailed, Relational, Flat Isolated Repetitive Read/ Write Index/hash on prim. Key Tens
Thousands 100MB –GB Transaction Throughput
Knowledge worker Decision Support Subject-oriented Historical, Summarized Multidimensional Integrated, consolidated Ad-hoc Lots of scans Complex Query Millions
Hundreds 100 GB – TB Query Throughput/
Response
UsersFunction
DB designData
UsageAccessUnit of
Work# of
Records Accessed
# of UsersDB sizeMetric
A data warehouse is a Structured
Extensible Environment Designed for the analysis of :
• non -volatile data,
• logically and physically transformed • from multiple source applications
• to align with business structure,
• updated and maintained for a long time period,
• expressed in simple business terms,
• and summarized for quick analysis.
•Subject Oriented•Integrated•Non Volatile and Time Variant
Accounting
Order Entry
Billing
Operational data is organized by specific processes or tasks and is maintained by separate systems
OperationalSystems
Customer
Usage
Revenue
Warehoused data is organized by subject area and is populated from many operational systems
DataWarehouse
Measurement ofAttributes
Naming Conventions
Data type format
Data type format
Physical attributeof data
Encoding structures
Operational
DWH
Current Value Data:
Time Horizon: 60-90 daysKey: May or May not have Time
VOLATILE:Data can be updated
Current Value Data:
Time Horizon: 60-90 daysKey: May or May not have Time
VOLATILE:Data can be updated
Snap Shot Data:
Time Horizon: 5 – 10 yearsKey: Contains an element of Time
NON- VOLATILEOne snap shot made, data cannot be updated
Snap Shot Data:
Time Horizon: 5 – 10 yearsKey: Contains an element of Time
NON- VOLATILEOne snap shot made, data cannot be updated
Data warehousing is the process of Extracting, Integrating, Filtering, Standardizing, Transforming, Cleaning and Quality checking to the organization applications data and storing it in a consolidated database.
Extracting is chucking of the data from disparate sources.
Extracting is chucking of the data from disparate sources.Putting together the extracted in to
a consistent format.Putting together the extracted in to
a consistent format.Data filtering is the process of extracting the required data from the OLTP or external source data sources. For example the user may be interested only last five years sales data.
Data filtering is the process of extracting the required data from the OLTP or external source data sources. For example the user may be interested only last five years sales data.
As the Data will be moved to from different OLTP database or flat file system, to one target, data need to be standardized. For
example Date fields or Flag fields.
As the Data will be moved to from different OLTP database or flat file system, to one target, data need to be standardized. For
example Date fields or Flag fields.
Data is extracted from OLTP databases and external data source data. Data transformation will have to be carried out on the extracted data before data is carried to Data warehouse.
Data is extracted from OLTP databases and external data source data. Data transformation will have to be carried out on the extracted data before data is carried to Data warehouse.
To ensure the Data quality, accuracyTo ensure the Data quality, accuracy
13
Data warehouse
storage
Source systems
OLAP Server
End user views
Flow of information
Direction of analysis
1
2
Flat Files
Data from RDBMS
Staging Area
Data Warehouse
Source Systems
Data Extraction, Transformation and Loading
(ETL)
Business Intelligence
Querying and Reporting
OLAP
Data Mining
ETL(extraction, transformation, and
loading) The process of extracting data from
source systems and bringing it into the
data warehouse It includes the transportation phase and
each of the phases of the process are not
distinct.
Major ETL Tools are:
• Informatica Power Mart
• Informatica Power Center
•DP Warehouse
•Oracle Express
•Data Mirror
1. Highly normalized2. Design considerations are:
• Quick retrieval• Fast updates
3. No redundancies
1. Highly normalized2. Design considerations are:
• Quick retrieval• Fast updates
3. No redundancies
RDBMS Model MDDB Model
DateMonthYear
Date
CustIdCustNameCustCity
CustCountry
CustIdCustNameCustCity
CustCountry
CustomerCustomer
Sales Fact Table
Date
ProductProduct
StoreStore
CustomerCustomer
Unit_salesUnit_sales
Dollar_salesDollar_sales
Schilling_salesSchilling_salesMeasurementsMeasurements
ProductNoProdNameProdDescCategory
QOH
ProductNoProdNameProdDescCategory
QOH
ProductProduct
StoreIDCity
StateCountryRegion
StoreIDCity
StateCountryRegion
StoreStore
A refinement of star schema Where the dimensional hierarchy is represented explicitly by normalizing the dimensional tables .
MonthMonth DateMonthDateMonth
Sales Fact Table
DateDate
ProductProduct
StoreStore
CustomerCustomer
Unit_salesUnit_sales
Dollar_salesDollar_sales
Schilling_salesSchilling_sales
ProductNoProdNameProdDescCategoryQOH
ProductNoProdNameProdDescCategoryQOH
ProductProduct
CustIdCustNameCustCityCustCountry
CustIdCustNameCustCityCustCountry
CustomerCustomer
DateDateMonthYearMonthYear
YearYear
YearYear
CityStateCityState
CityCity
CountryRegionCountryRegion
CountryCountry
StateCountryStateCountry
StateStateStoreIDCityStoreIDCity
StoreStore
MeasurementsMeasurements
To find out Chronic Aircraft
Criteria: Three reports in 15 days or 45 flights. Whatever comes first .