datawhs
-
Upload
nishank-reddy-h -
Category
Documents
-
view
217 -
download
0
Transcript of datawhs
-
8/9/2019 datawhs
1/25
-
8/9/2019 datawhs
2/25
ObjectiveIn this presentation we will
be focusing on
Data, information.
Need of information.
Database Data warehousing concept
Difference betweendatabase and datawarehouse.
Architecture of datawarehousing
Applications of Datawarehousing
-
8/9/2019 datawhs
3/25
Data and Information They may sound synonymous but are
very different from each other.
Data are plain facts. When data are
processed, organized, structured orpresented in a given context so as tomake them useful, they are calledInformation.
-
8/9/2019 datawhs
4/25
Database
Database is a placewhere data is taken asa
base and managed toget available fast andefficient access
In general it is a
reservoir havingoperational dataneeded for dailybusiness.
-
8/9/2019 datawhs
5/25
Data Warehouse
A single, complete andconsistent store ofdata obtained from avariety of different
sources made availableto end users in a whatthey can understandand use in a business
context.
-
8/9/2019 datawhs
6/25
Similarities anddifferences
Database and Data warehouse,Bothsupport application data storage accessand retrieval of data but only Datawarehouse support Decision making andBusiness intelligence.
Data warehouse has historical data andthe current data so comparison becomes
very easy for business analysis.
-
8/9/2019 datawhs
7/25
Major Distinction
Database is used for running thebusiness.
Data warehouse is used to know howto run the business.
-
8/9/2019 datawhs
8/25
A situation!
You are an analyst working for a computer firm.
As an analyst you want know-
1- Who are your customers?
2- What kind of products they want to buy?
3- what is the most effective distribution channel?4- What product promotions have the big impact
on revenue?
5- How do I decide upon segmentation ?
-
8/9/2019 datawhs
9/25
What do I do? Data scattered all over network based database.
Data is available but cant understand it.
Data needs to be collected and summarized togive it some meaning.
Data collected cannot be used for analysis.
All you need is to have a data warehouse in your
company.
-
8/9/2019 datawhs
10/25
-
8/9/2019 datawhs
11/25
Warehouses are VeryLarge Databases
35%
30%
25%
20%
15%
10%
5%
0%5GB
5-9GB
10-19GB 50-99GB 250-499GB
20-49GB 100-249GB 500GB-1TB
Initial
Projected 2Q96
Source: META Group, Inc.
Respon
de
nts
-
8/9/2019 datawhs
12/25
Terabytes -- 10^12bytes:
Petabytes -- 10^15bytes:
Exabytes -- 10^18bytes:
Zettabytes -- 10^21bytes:
Zottabytes -- 10^24
Walmart -- 24Terabytes
GeographicInformationSystems
National MedicalRecords
Weather images
Intelligence AgencyVideos
Very Large Data Bases
-
8/9/2019 datawhs
13/25
Data WarehouseArchitecture
Data Warehouse
Engine
Optimized Loader
Extraction
Cleansing
Analyze
Query
Metadata Repository
Relational
Databases
Legacy
Data
Purchased
Data
ERPSystems
-
8/9/2019 datawhs
14/25
Components of theWarehouse
Data Extraction and Loading
The Warehouse
Analyze and Query -- OLAP Tools Metadata
Data Mining tools
-
8/9/2019 datawhs
15/25
Loading the Warehouse
Cleaning the data
before it isloaded
-
8/9/2019 datawhs
16/25
Why is LOADINGrequired?
Warehouse data comes fromdisparate questionable sources
Outside sources withquestionable quality procedures
-
8/9/2019 datawhs
17/25
Data Integration AcrossSources
Trust Credit cardSavings Loans
Same datadifferent name
Different dataSame name
Data found herenowhere else
Different keyssame data
-
8/9/2019 datawhs
18/25
Data TransformationTerms
Extracting
Conditioning
Scrubbing Merging
Householding
Enrichment
Scoring
Loading Validating
Delta Updating
-
8/9/2019 datawhs
19/25
Refresh
Propagate updates on source data tothe warehouse
Issues: when to refresh
how to refresh -- refresh techniques
-
8/9/2019 datawhs
20/25
De-normalization
Normalization in a data warehousemay lead to lots of small tables
Can lead to excessive I/Os sincemany tables have to be accessed
De-normalization is the answerespecially since updates are rare
-
8/9/2019 datawhs
21/25
True Warehouse
Data Marts
Data Sources
Data Warehouse
-
8/9/2019 datawhs
22/25
A Sample Query
Select month, dollars, cume(dollars)as run_dollars, weight, cume(weight)as run_weights
from sales, market, product, periodtwhere year = 1993and product like Columbian%
and city like San Fr%order by t.perkey
-
8/9/2019 datawhs
23/25
Automated processes indata warehouses
select product, dollars as jun97_sales,(select sum(s1.dollars)from market mi, product pi, period, ti, sales siwhere pi.product = product.productand ti.year = period.yearand mi.city = market.city) as total97_sales,100 * dollars/(select sum(s1.dollars)from market mi, product pi, period, ti, sales siwhere pi.product = product.productand ti.year = period.yearand mi.city = market.city) as percent_of_yr
from market, product, period, sales where year = 1997
and month = June and city like Ahmed%
order by product;
-
8/9/2019 datawhs
24/25
Applications
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud AnalysisTelecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providersValue added data
Utilities Power usage analysis
-
8/9/2019 datawhs
25/25