Final presentation
-
Upload
dave-nawazish-ali -
Category
Technology
-
view
1.604 -
download
0
description
Transcript of Final presentation
Data WarehousingData Warehousing
“A data warehouse is a subject-oriented, integrated, time-variant, and nonupdatable collection of data in support of management’s decision-making process.”
• Subject-Oriented High level Entities like Customers, Patients, Students, Products and time.
• Integrated Data gathered from several internal system of records or
from sources external to the
organization.
Time-Varient Time dimension is used in Data Warehousing to study the trends and
changes.
Nonupdatable New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data,
incrementally integrating it with previous data.
• Data warehouse can be more than one database
• In Simple Words
“A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”
Problem: Heterogeneous Problem: Heterogeneous Information SourcesInformation Sources
“Heterogeneities are everywhere”
Different interfaces Different data representations Duplicate and inconsistent information Combined research results from different bioinformatics repositories
PersonalDatabases
Digital Libraries
Scientific DatabasesWorldWideWeb
Goal: Unified Access to DataGoal: Unified Access to Data
Integration System
Collects and combines information Provides integrated view, uniform user interface Supports sharing
WorldWideWeb
Digital Libraries Scientific Databases
PersonalDatabases
The Need for Data WarehousingThe Need for Data Warehousing
1. A business requires an integrated, companywide view of high quality information.
2. The information systems department must separate informational from operational systems( system of records) to improve performance dramatically in managing company data.
Why a WarehouseWhy a Warehouse
• For analysis and decision support, end users require access to data captured and stored in an organization’s operational or production systems.
• This data is stored in multiple formats, on multiple platforms, in multiple data structures, with multiple names, and probably created using different business rules
Why should we consider Data Why should we consider Data Warehousing solutions ?Warehousing solutions ?
When users are requesting access to a large amount of
historical information for reporting purposes, you
should strongly consider a warehouse. The user will
benefit when the information is organized in an
efficient manner for this type of access.
An Example to look at the need of An Example to look at the need of Data WarehousingData Warehousing
Data Warehouse ComponentsData Warehouse Components
CombinedData
Warehouse
DecisionSupport Tools
Management ReportingSales/Marketing
Customer RelationsReserve Analysis
Risk Analysis
Data WarehouseComponents
Customers
Policies
PremiumsClaims
Reserves
Rates
Extract ProgramsData Cleansers/ScrubbersTranslators/Transformers
Timing ToolsData LoadingFile Transfer
MainframeApplications
PCApplications
DB2/2
ExternalSources
???
Midrange
DB/6000
DB/400
IMS
VSAMDB/2
Administration and Management Administration and Management ToolsTools
• a data warehouse requires tools to support the administration and management of such complex enviroment.
• for the various types of meta-data and the day-to-day operations of the data warehouse, the administration and management tools must be capable of supporting those tasks:
monitoring data loading from multiple sourcesdata quality and integrity checksmanaging and updating meta-datamonitoring database performance to ensure efficient query
response times and resource utilization
• auditing data warehouse usage to provide user chargeback information
• replicating, subsetting, and distributing data
• maintaining effient data storage management
• archiving and backing-up data• implementing recovery following failure• security management
In computers, the path of data from source document to data entry to processing to final reports. Data changes format and sequence (within a file) as it moves from program to program.
Is known as Data flow
Data FlowData Flow
• Inflow- The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse.
• upflow- The process associated with adding value to the data in the warehouse through summarizing, distribution of the data.
• downflow- The processes associated with archiving and backing-up of data in the warehouse.
• outflow- The process associated with making the data availabe to the end-users.
• Meta-flow- The processes associated with the management of the meta-data.
ArchitecturesArchitectures
• Many database architectures has been implemented
2 architectures need to be quoted:
1. OLTP (OnLine Transaction Processing)
2. Data Warehouse (OLAP)(online analytical processing)
• OLTP is used to store data and query it frequently and is based on normalized schemas.
• Data warehouse is used to store data history and is based on fact tables and dimension tables.
Difference between Difference between OLTP and DataWare House OLTP and DataWare House
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date detailed
historical, summarized, multidimensional integrated
access read/write index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
• Special Thanks to
Google.comand othe sites.
Thank You