Data warehousing
-
Upload
mandar-kulkarni -
Category
Technology
-
view
551 -
download
0
Transcript of Data warehousing
![Page 1: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/1.jpg)
Data Warehousing&
Data Mining
By Mandar KulkarniPRN 10030141129
MBA-ITSICSR
![Page 2: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/2.jpg)
Contents
• Data warehousing• Understanding data warehousing• Data warehouse architecture• Data Mining• Data mining techniques
![Page 3: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/3.jpg)
Warehouse?
Real time example?
![Page 4: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/4.jpg)
Data Warehousing
![Page 5: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/5.jpg)
Samsung
Mumbai
Delhi
Chennai
Banglore
SalesManager
Sales per item type per branchfor first quarter.
![Page 6: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/6.jpg)
• Now, the sales manager wants to know the sales of first quarter.?
• Solution– Extract information from each database store it at
a single place, and process using operational systems.!
![Page 7: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/7.jpg)
Mumbai
Delhi
Chennai
Banglore
DataWarehouse
SalesManager
Query &Analysis tools
Report
Solution
![Page 8: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/8.jpg)
Operational Systems
• Running the business real time• Routine tasks• Decision Support Systems(DSS)– Help in taking actions!
• Used by people who deal with customers, products
• They are increasingly used by customers
![Page 9: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/9.jpg)
Data Warehouse
• A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.
• A process of transforming data into information and making it available to users in a timely enough manner to make a difference
![Page 10: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/10.jpg)
Definition
• Integrated, Subject-Oriented, Time-Variant, Nonvolatile database that provides support for
decision making
![Page 11: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/11.jpg)
Data warehouse architecture
![Page 12: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/12.jpg)
External
Production
Internal
Source Data
Archived Data MartsData Staging
Metadata
Data Warehouse DBMS
MDDB
Information DeliveryManagement & Control
OLAP
Report /Query
Data Mining
![Page 13: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/13.jpg)
Components
• Source Data • Data Staging (Data Extraction, cleaning And Loading )– Talend is the first open source ETL tool
• Data Storage • Information Delivery (EIS)• Management and control
![Page 14: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/14.jpg)
OLAP
• Online Analytical Processing Tools• DSS tools that use multidimensional data
analysis techniques– Support for a DSS data store– Data extraction and integration filter– Specialized presentation interface
• Oracle OLAP 11G
![Page 15: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/15.jpg)
Multidimensional analysis
![Page 16: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/16.jpg)
OLAP architecture
![Page 17: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/17.jpg)
12 Rules of Data Warehouse
1. Data Warehouse and Operational Environments are Separated
2. Data is integrated3. Contains historical data over a long period of
time4. Data is a snapshot data captured at a given
point in time5. Data is subject-oriented
![Page 18: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/18.jpg)
6.Mainly read-only with periodic batch updates
7.Development Life Cycle has a data driven approach versus the traditional process-driven approach
8.Data contains several levels of detail-Current, Old, Lightly Summarized, Highly Summarized
![Page 19: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/19.jpg)
9.Environment is characterized by Read-only transactions to very large data sets
10.System that traces data sources, transformations, and storage
11.Metadata is a critical component– Source, transformation, integration, storage, relationships,
history, etc
12.Contains a chargeback mechanism for resource usage that enforces optimal use of data by end users
![Page 20: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/20.jpg)
OLTP v/s Data warehousing
OLTP• Application Oriented • Used to Run Business• Detailed data • Current up-to date • Isolated data• Repetitive Access• Performance Sensitive• Few records accessed• Read/Update Access
Data Warehousing • Subject Oriented• Used to analyze business• Summarized and refined• Snapshot Data • Integrated Data• Ad-Hoc Access• Performance relaxed• Large volume accessed at a
time• Mostly Read
![Page 21: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/21.jpg)
Data Warehouse summary
• Integrated platform for OLAP and DSS
• Helps optimize business operations
• Easy access to multidimensional data
![Page 22: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/22.jpg)
Data Mining
![Page 23: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/23.jpg)
Why Data Mining?
Strategic decision making
Wealth generation
Analyzing trends
Security
![Page 24: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/24.jpg)
Data Mining
• Look for hidden patterns and trends in data that is not immediately apparent from summarizing the data
• No Query…
• …But an “Interestingness criteria”
![Page 25: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/25.jpg)
Data Mining
+ =Data
Interestingnesscriteria
Hiddenpatterns
![Page 26: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/26.jpg)
Data Mining
+ =Data
Interestingnesscriteria
Hiddenpatterns
Type of Patterns
![Page 27: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/27.jpg)
Data Mining
+ =Data
Interestingnesscriteria
Hiddenpatterns
Type of data Type of Interestingness criteria
![Page 28: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/28.jpg)
Type of Data• Tabular (Ex: Transaction data)
– Relational– Multi-dimensional
• Tree (Ex: XML data)
• Graphs
• Sequence (Ex: DNA, activity logs)
• Text, Multimedia …
![Page 29: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/29.jpg)
Type of Interestingness
• Frequency• Rarity• Correlation • Length of occurrence (for sequence and temporal data)
• Consistency • Repeating / periodicity • “Abnormal” behavior • Other patterns of interestingness…
![Page 30: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/30.jpg)
Data Mining vs Statistical Inference
Statistics:
ConceptualModel
(Hypothesis)
StatisticalReasoning
“Proof”(Validation of Hypothesis)
![Page 31: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/31.jpg)
Data Mining vs Statistical Inference
Data mining:
MiningAlgorithmBased on InterestingnessData
Pattern (model, rule, hypothesis)discovery
![Page 32: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/32.jpg)
Used for..
• Data mining is used for– Frequent Item-sets– Associations– Classifications– Clustering
![Page 33: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/33.jpg)
Techniques • Algorithms– Apriori algorithm
– Decision tree• SLIQ– Supervised Learning in QUEST– IBM
• “GROUP BY”mysql> select sum(sal),deptno from emp group by deptno;
![Page 34: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/34.jpg)
Data Mining Summary
• Helps in pattern analysis and thus taking actions –real time and future based.
• Analyzing trends and clusters in business operations.
![Page 35: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/35.jpg)
References
• http://www.datawarehousing.com/ • http://www.dw-institute.com/ • http://www.almaden.ibm.com/cs/quest/index.html
![Page 36: Data warehousing](https://reader036.fdocuments.in/reader036/viewer/2022081603/55850972d8b42ac60a8b468c/html5/thumbnails/36.jpg)
Thank you
Any Questions?