Data warehousing

9
DATA WAREHOUSE

Transcript of Data warehousing

DATA WAREHOUSE

What is a Data Warehouse

A Data Warehouse can be defined as a centralized, consistent data

store or Decision Support System (OLAP), for the end business users

for analysis, prediction and decision making in their business

operations.

Data from various enterprise-wide application/transactional source

systems (OLTP), are extracted, cleansed, integrated, transformed

and loaded in the Data Warehouse.

Benefits of data warehouse

A data warehouse helps to integrate data and store them

historically so that we can analyze different aspects of business including, performance analysis, trend, prediction etc. over a given

time frame and use the result of our analysis to improve the

efficiency of business processes.

Data warehouses are built to facilitate reporting on different key business processes of an organization, known as KPI.

Data warehouse can be further used for trend prediction, forecasts,

pattern recognition etc.

Properties of a Data Warehouse

Subject Oriented - Data Warehouse is designed based on the major subjects areas of

the business, rather than the major application areas of the enterprise. The ultimate

necessity is to store decision-support data rather than application-oriented data.

Integrated - The data in Data Warehouse comes from different enterprise-wide application source systems. The source data is often inconsistent in nature. During the

Integration process, source data must be made consistent in Data Warehouse so that

the data becomes homogeneous and uniform.

Time-Variant - The source data in the Data Warehouse is only accurate and valid at

some point in time or over some time interval.

Non-Volatile - Data in Data Warehouse is not updated or replaced in real time but is

refreshed from operational systems on a regular basis. New data is always added incrementally integrating it with the previous data set.

Schematic View of a data warehouse

ETL process

ETL (Extract, Transform and Load) is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. ETL involves the following tasks:

extracting the data from source systems (SAP, ERP, other operational systems), data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing.

continued….

Transforming the data may involve the following tasks:

Applying business rules

Cleaning (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.),

Filtering (e.g., selecting only certain columns to load)

Splitting a column into multiple columns and vice versa

Joining together data from multiple sources

Transposing rows and columns

Applying any kind of simple or complex data validation

loading the data into a data warehouse or data repository other reporting applications

ETL layer is mostly responsible for 2 types of treatments on the

data:

Data Integration - process of combining heterogeneous data sources in to a

single queriable schema so as to get an unified view of these data.

Qualitative Treatment - so that the validity and quality of the data can be checked (and if required corrected) before loading to the data warehouse.

THANK YOU