Final presentation

18
Data Warehousing Data Warehousing “A data warehouse is a subject- oriented , integrated , time-variant , and nonupdatable collection of data in support of management’s decision- making process.” Subject-Oriented High level Entities like Customers, Patients, Students, Products and time. Integrated Data gathered from several internal system of records or from sources external to the organization.

description

Data Warehousing

Transcript of Final presentation

Page 1: Final presentation

Data WarehousingData Warehousing

“A data warehouse is a subject-oriented, integrated, time-variant, and nonupdatable collection of data in support of management’s decision-making process.”

• Subject-Oriented High level Entities like Customers, Patients, Students, Products and time.

• Integrated Data gathered from several internal system of records or

from sources external to the

organization.

Page 2: Final presentation

Time-Varient Time dimension is used in Data Warehousing to study the trends and

changes.

Nonupdatable New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data,

incrementally integrating it with previous data.

• Data warehouse can be more than one database

Page 3: Final presentation

• In Simple Words

“A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”

Page 4: Final presentation

Problem: Heterogeneous Problem: Heterogeneous Information SourcesInformation Sources

“Heterogeneities are everywhere”

Different interfaces Different data representations Duplicate and inconsistent information Combined research results from different bioinformatics repositories

PersonalDatabases

Digital Libraries

Scientific DatabasesWorldWideWeb

Page 5: Final presentation

Goal: Unified Access to DataGoal: Unified Access to Data

Integration System

Collects and combines information Provides integrated view, uniform user interface Supports sharing

WorldWideWeb

Digital Libraries Scientific Databases

PersonalDatabases

Page 6: Final presentation

The Need for Data WarehousingThe Need for Data Warehousing

1. A business requires an integrated, companywide view of high quality information.

2. The information systems department must separate informational from operational systems( system of records) to improve performance dramatically in managing company data.

Page 7: Final presentation

Why a WarehouseWhy a Warehouse

• For analysis and decision support, end users require access to data captured and stored in an organization’s operational or production systems.

• This data is stored in multiple formats, on multiple platforms, in multiple data structures, with multiple names, and probably created using different business rules

Page 8: Final presentation

Why should we consider Data Why should we consider Data Warehousing solutions ?Warehousing solutions ?

When users are requesting access to a large amount of

historical information for reporting purposes, you

should strongly consider a warehouse. The user will

benefit when the information is organized in an

efficient manner for this type of access.

Page 9: Final presentation

An Example to look at the need of An Example to look at the need of Data WarehousingData Warehousing

Page 10: Final presentation

Data Warehouse ComponentsData Warehouse Components

CombinedData

Warehouse

DecisionSupport Tools

Management ReportingSales/Marketing

Customer RelationsReserve Analysis

Risk Analysis

Data WarehouseComponents

Customers

Policies

PremiumsClaims

Reserves

Rates

Extract ProgramsData Cleansers/ScrubbersTranslators/Transformers

Timing ToolsData LoadingFile Transfer

MainframeApplications

PCApplications

DB2/2

ExternalSources

???

Midrange

DB/6000

DB/400

IMS

VSAMDB/2

Page 11: Final presentation
Page 12: Final presentation

Administration and Management Administration and Management ToolsTools

• a data warehouse requires tools to support the administration and management of such complex enviroment.

• for the various types of meta-data and the day-to-day operations of the data warehouse, the administration and management tools must be capable of supporting those tasks:

monitoring data loading from multiple sourcesdata quality and integrity checksmanaging and updating meta-datamonitoring database performance to ensure efficient query

response times and resource utilization

Page 13: Final presentation

• auditing data warehouse usage to provide user chargeback information

• replicating, subsetting, and distributing data

• maintaining effient data storage management

• archiving and backing-up data• implementing recovery following failure• security management

Page 14: Final presentation

In computers, the path of data from source document to data entry to processing to final reports. Data changes format and sequence (within a file) as it moves from program to program.

Is known as Data flow

Page 15: Final presentation

Data FlowData Flow

• Inflow- The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse.

• upflow- The process associated with adding value to the data in the warehouse through summarizing, distribution of the data.

• downflow- The processes associated with archiving and backing-up of data in the warehouse.

• outflow- The process associated with making the data availabe to the end-users.

• Meta-flow- The processes associated with the management of the meta-data.

Page 16: Final presentation

ArchitecturesArchitectures

• Many database architectures has been implemented

2 architectures need to be quoted:

1. OLTP (OnLine Transaction Processing)

2. Data Warehouse (OLAP)(online analytical processing)

• OLTP is used to store data and query it frequently and is based on normalized schemas.

• Data warehouse is used to store data history and is based on fact tables and dimension tables.

Page 17: Final presentation

Difference between Difference between OLTP and DataWare House OLTP and DataWare House

OLTP OLAP

users clerk, IT professional knowledge worker

function day to day operations decision support

DB design application-oriented subject-oriented

data current, up-to-date detailed

historical, summarized, multidimensional integrated

access read/write index/hash on prim. key

lots of scans

unit of work short, simple transaction complex query

# records accessed tens millions

#users thousands hundreds

DB size 100MB-GB 100GB-TB

Page 18: Final presentation

• Special Thanks to

Google.comand othe sites.

Thank You