Principles of Data Warehousing
description
Transcript of Principles of Data Warehousing
![Page 2: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/2.jpg)
Outline
2
Data Warehouse
Metadata
ETL
Data Marts
OLAP
Multidimensional Data
![Page 3: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/3.jpg)
A Manager’s Questions …
Who are our lowest orhighest margin customers ?
Who are my customers and what products are they buying?
Which customers are most likely to go to the competition ?
What promotions have the biggest impacton revenue?
What is the most effective distribution channel?
What impact will new products/services have on revenue and margins?
3
![Page 4: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/4.jpg)
Tourists, Farmers and Explorers
4
Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data.
Farmers: Harvest informationfrom known access paths.
Tourists: Browse information harvested by farmers.
![Page 5: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/5.jpg)
History & Evolution
60’s: Batch Reports Hard to find and analyze information Inflexible and expensive, reprogram every new request
70’s: Terminal-Based DSS and EIS Still inflexible, not integrated with desktop tools
80’s: Desktop Data Access and Analysis Tools Query tools, Spreadsheets, GUIs Easier to use, but only access operational databases
90’s: Data Warehousing OLAP Engines and Tools
5
![Page 6: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/6.jpg)
Data Everywhere
I cannot find the data I need. Data are scattered over the network. Many versions
I cannot get the data I need. May need experts to get the data.
I cannot understand the data I found. Poorly documented Domain knowledge
I cannot use the data I found. Quality Transformation
6
![Page 7: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/7.jpg)
What is a data warehouse?
7
“A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a way that they can understand and use in a business context.”
![Page 8: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/8.jpg)
What is data warehousing?
Data warehousing: techniques for assembling and managing data from various sources for the purpose of answering business questions and making decisions.
A data warehouse is a collection of data that is used primarily in organizational decision making.
A data warehouse is Subject-oriented Integrated Time-varying Non-volatile
8Data
Information
Knowledge
![Page 9: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/9.jpg)
Data Warehouse Architecture
9
Data Warehouse Engine
Optimized Loader
ExtractionCleansing
AnalyzeQuery
Metadata Repository
RelationalDatabases
LegacyData
Purchased Data
ERPSystems
![Page 10: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/10.jpg)
Data Warehouse is …
Subject-Oriented
The data warehouse is organized around subjects of the enterprise (e.g., customers, products, sales) rather than applications areas (e.g., customer invoicing, stock control, product sales).
This is reflected in the need to store decision-support data instead of application-oriented or operational data.
Integrated
The data warehouse integrates corporate application-oriented data from different sources, which often include inconsistent data.
The integrated data sources must be made consistent to present a unified view of the data to the users.
10
![Page 11: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/11.jpg)
Data Warehouse is …
Time-Variant
Data warehouses are time variant in the sense that they maintain both historical and (nearly) current data.
Historical information is of high importance to decision makers, who often want to understand trends and relationships between data.
Non-Volatile
After the data are loaded into the data warehouse, there are no changes, inserts, or deletes performed against the historical data.
This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.
11
![Page 12: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/12.jpg)
Operational Systems
Operational Systems Run the business in real time. Based on up-to-the-second data. Optimized to handle large numbers of simple read/write transactions. Optimized for fast response to predefined transactions. Used by people who deal with customers, products.
Database systems have been used traditionally for OLTP. Online Transaction Processing Clerical data processing tasks Detailed, up to date data Structured repetitive tasks
Examples of Operational Data Customer Files Account Balance, Call Record Point of Sale Data, Production Record
12
![Page 13: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/13.jpg)
Data Warehousing vs. OLTP
Workload
Data warehouses are designed to accommodate ad hoc queries. A data warehouse should be optimized to perform well for a wide variety of possible query operations.
OLTP systems support only predefined operations and might be specifically tuned or designed to support only these operations.
Data Modifications
A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques. The users of a data warehouse do not directly update the data warehouse.
In OLTP systems, users routinely issue individual data modification statements to the database. The OLTP database is always up to date, and reflects the current state of each business transaction.
13
![Page 14: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/14.jpg)
Data Warehousing vs. OLTP
Schema Design
Data warehouses often use denormalized or partially denormalized schemas (such as a star schema) to optimize query performance.
OLTP systems often use fully normalized schemas to optimize update/insert/delete performance, and to guarantee data consistency.
Typical Operations
A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for all customers last month."
A typical OLTP operation accesses only a handful of records. For example, "Retrieve the current order for this customer."
14
![Page 15: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/15.jpg)
Data Warehousing vs. OLTP
Historical Data
Data warehouses usually store months or years of data to support historical analysis.
OLTP systems usually store data from only a few weeks or months to meet the requirements of the current transaction.
Number of Users
Data Warehouses: hundreds of users. OLTP Systems : tens of thousands users.
Database Size
Data Warehouses: 10GB - 1TB OLTP Systems: 100M - 10GB
15
![Page 16: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/16.jpg)
In summary …
16
Data warehousing helps optimize the business.
OLTP systems actually run the business.
![Page 17: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/17.jpg)
Data Marts
A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject, that may be distributed to support business needs.
Departmental Data Warehouse
A data warehouse tends to be a strategic but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need.
The smaller-scale data mart is typically easier to build than the enterprise-wide warehouse; can be quickly implemented; and offers tremendous, fast payback for the users.
The downside comes when several department-focused data marts are implemented with no forethought for a future data warehouse that serves the entire enterprise.
17
![Page 18: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/18.jpg)
Independent Data Marts
18
![Page 19: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/19.jpg)
Dependent Data Marts
19
![Page 20: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/20.jpg)
Data Granularity
Granularity is the extent to which a system is broken down into small parts, either the system itself or its description or observation.
A key factor to consider in the design of data warehouses.
The amount of data to be stored in the data warehouse.
Operational Databases Transaction Oriented Detailed Records Lowest Level of Granularity The details of the phone call made by Tom at 2:40pm yesterday
Data Warehouses Decision Making Summarized Data High Levels of Granularity The number of phone calls made by Tom last month
20
![Page 21: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/21.jpg)
Data Granularity
21
![Page 22: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/22.jpg)
Data Granularity
High Levels of Granularity Reduce storage costs. Reduce CPU usage. Cannot answer certain queries.
• Did Tom call Mary last week? A tradeoff between the volume and the usage of data.
Dual Levels of Granularity Store summarized data on disks.
• Cover 95% decision making queries.• Data access is cheap and convenient.
Store detailed data on tapes .• Cover 5% decision making queries.• Many records need to be involved to process a query.• Data access is expensive and complicated.
Many levels of granularity may be necessary in practice.
22
![Page 23: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/23.jpg)
Data Partition
23
Frequently Accessed
Rarely AccessedSmaller Table & Less I/O
Acct. No Name BalanceDate Opened Interest Rate Address
Acct. No Balance Acct. No Name Date OpenedInterest Rate Address
![Page 24: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/24.jpg)
Data Quality
Data warehouses are based on existing data sources.
Data quality matters!
Creating a data warehouse is not a straightforward process.
Warehouse data are from disparate and questionable sources.
Legacy systems are no longer documented.
Corporate wide standards are not well implemented.
Advanced techniques and tools are needed to do the job.
24
![Page 25: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/25.jpg)
25
10 Minutes …
![Page 26: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/26.jpg)
Extract, Transform & Load
Extract, Transform & Load (ETL) The interface between external sources and data warehouses ETL may take around 70% of the total workload. Can be implemented manually in any programming language. Commercial ETL tools are widely available.
Extract To consolidate data from different source systems.
• Flat Files• Relational Databases• Customized Applications• Point of Sale Devices• Web Pages
To locate the sources for each data item in the data warehouse.• Not all data are to be extracted.
26
![Page 27: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/27.jpg)
Extract, Transform & Load
Transform To apply a series of rules or functions to the extracted data to derive the data
for loading into the end target. Typical Functions
• Formatting• Encoding• Aggregating• Splitting• Deriving• Converting• Integrating
Load To load the extracted, cleaned and validated data into the end target.
• Online vs. Offline Loads• Incremental vs. Full Loads
27
![Page 28: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/28.jpg)
ETL --- Challenges
28
Trust Credit Card
Savings Loans
Same data different name
Different data same name
Inconsistentname or data
![Page 29: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/29.jpg)
ETL --- Challenges
29
enco
ding
unit
field
appl A - balanceappl B - balappl C - currbalappl D - balcurr
appl A - pipeline - cmappl B - pipeline - inappl C - pipeline - feetappl D - pipeline - yds
appl A - m,fappl B - 1,0appl C - x,yappl D - male, female
Data WarehouseExternal Sources
![Page 30: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/30.jpg)
ETL --- Challenges
Same person, different spellings 吕: LV, LUI, LYU
Multiple ways to denote company name Global Systems, GSPL, Global Pty. LTD.
Use of different names for the same object/concept Holland vs. Netherland
Inconsistent data values Age, Marital Status …
Required fields left blank Missing Values
Invalid product codes collected at point of sale Manual entry leads to mistakes. Different conventions: using “-1” or “99999” to indicate an error
30
![Page 31: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/31.jpg)
Metadata
Metadata is information about data.
Metadata is used to facilitate the understanding, characteristics, and management usage of data.
Metadata can document data about data attributes & structure.
Metadata may include descriptive information about the context, quality and condition, or characteristics of the data.
Metadata for a Book Title, Author, Subject, ISBN, Number of Pages …
Metadata for a data warehouse The data defining warehouse objects A roadmap telling users what are in there and how to find them Far more sophisticated than a data dictionary
31
![Page 32: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/32.jpg)
Metadata Repository
Data definition and mapping metadata The meaning of each attribute and where the data come from
Data structure metadata The structure of the tables (the data type of each column, primary/foreign key)
Source system metadata The data structure of all the source systems feeding in the warehouse
ETL process metadata The description of each data flow (source, target, transformation, schedule)
Data quality metadata Data quality rules and where they are applicable for, their risk level and actions
Audit metadata The results of all processes (ETL, security log, indexing) in the warehouse
Usage metadata Records about which reports and cubes are used by who and when
32
![Page 33: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/33.jpg)
Data Models in Data Warehouses
In OLTP systems, data are stored in 2D matrixes.
Data warehouses are subject-oriented Profits, Sales … Data need to be reorganized to better reflect the subjects.
A data warehouse is based on a multidimensional data model, which views data in the form of a data cube.
A data cube allows data to be modeled and viewed in multiple dimensions.
Fact tables contain measures of interest (such as dollars sold) and keys to each of the related dimension tables.
Dimension tables provide the context of the measures such as item (item name, brand), product, location or time(day, week, month, quarter, year).
33
![Page 34: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/34.jpg)
From Tables to Data Cubes
34
ID Product Country Date Sales1 TV US 1Qtr 1002 PC Canada 4Qtr 5003 CAR US 2Qtr 304 PC UK 3Qtr 2005 CAR UK 1Qtr 206 CAR UK 2Qtr 157 TV Canada 4Qtr 80
![Page 35: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/35.jpg)
From Tables to Data Cubes
35
Total annual salesof TV in U.S.A.Date
Produ
ct
Cou
ntrysum
sum TV
CARPC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
U.K.
sum
![Page 36: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/36.jpg)
Cube: A Lattice of Cuboids
36
all
time item location supplier
time,item time,location
time,supplier
item,location
item,supplier
location,supplier
time,item,location
time,item,supplier
time,location,supplier
item,location,supplier
time, item, location, supplier
0-D cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D cuboid
![Page 37: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/37.jpg)
Data Warehouse Schemas
Star Schema A fact table in the middle connected to a set of dimension tables
Snowflake Schema A refinement of star schema where some dimensional hierarchy is
normalized into a set of smaller dimension tables, forming a shape similar to snowflake
Fact Constellations Multiple fact tables sharing dimension tables, viewed as a collection
of stars, therefore called galaxy schema or fact constellation
37
![Page 38: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/38.jpg)
The Star Schema
38
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcityprovince_or_streetcountry
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasures
item_keyitem_namebrandtypesupplier_type
item
branch_keybranch_namebranch_type
branch
![Page 39: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/39.jpg)
The Star Schema: An Example
39
customer custId name address city53 joe 10 main sfo81 fred 12 main sfo
111 sally 80 willow la
product prodId name pricep1 bolt 10p2 nut 5
store storeId cityc1 nycc2 sfoc3 la
sale oderId date custId prodId storeId qty amto100 1/7/97 53 p1 c1 1 12o102 2/7/97 53 p2 c1 2 11105 3/8/97 111 p1 c3 5 50
![Page 40: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/40.jpg)
The Snowflake Schema
40
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcity_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_keyitem_namebrandtypesupplier_key
item
branch_keybranch_namebranch_type
branch
supplier_keysupplier_type
supplier
city_keycityprovince_or_streetcountry
city
![Page 41: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/41.jpg)
The Galaxy Schema
41
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcityprovince_or_streetcountry
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasures
item_keyitem_namebrandtypesupplier_type
item
branch_keybranch_namebranch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_keyshipper_namelocation_keyshipper_type
shipper
![Page 42: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/42.jpg)
Concept Hierarchy
42
Europe North_America
CanadaSpainGermany
Vancouver
M. WindL. Chan
...
......
... ...
...
all
region
office
country
TorontoFrankfurtcity
Location
![Page 43: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/43.jpg)
Set-Grouping Hierarchy
43
[$0 - $1000]
inexpensive
[$0 - $150]
moderate expensive
![Page 44: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/44.jpg)
View of Hierarchies
44
![Page 45: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/45.jpg)
Bitmap Index
Index on a particular column.
Each value in the column corresponds to a bit vector.
The length of the bit vector: # of unique records.
Not suitable for high cardinality domains
45
Cust Region TypeC1 Asia RetailC2 Europe DealerC3 Asia DealerC4 America RetailC5 Europe Dealer
RecID Retail Dealer1 1 02 0 13 0 14 1 05 0 1
RecID Asia Europe America1 1 0 02 0 1 03 1 0 04 0 0 15 0 1 0
Base Table Index on Region Index on Type
![Page 46: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/46.jpg)
OLAP
Online Analytical Processing
Fast Analysis of Shared Multidimensional Information (FASMI)
Slice and Dice: Project and Select
Roll up (drill-up): summarize data By climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-up From higher level summary to lower level summary or detailed data, or introducing
new dimensions
Pivot (rotate): Reorient the cube
46
![Page 47: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/47.jpg)
Browsing a Data Cube
47
![Page 48: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/48.jpg)
Slicing and dicing
48
Product
Sales Channel
Region
s
Retail Direct Special
Household
Telecomm
Video
Audio IndiaFar East
Europe
The Telecomm Slice
![Page 49: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/49.jpg)
Roll-Up & Drill-Down
49
Sales Channel Region Country State Location Address Sales Representative
Roll
Up
Higher Level ofAggregation
Low-levelDetails
Drill-Dow n
![Page 51: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/51.jpg)
Pivot
51
1047
3012
JuiceColaMilk Cream
NYLASF
3/1 3/2 3/3 3/4
Date
Regi
onProduct
![Page 52: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/52.jpg)
OLAP Server Architectures
Relational OLAP (ROLAP) Use relational DBMS to store and manage warehouse data. ROLAP tools access the data in a relational database and generate SQL
queries to calculate information at the appropriate level as required. Greater scalability
Multidimensional OLAP (MOLAP) Fast query performance due to optimized storage and indexing Automated computation of higher level aggregates of the data Very compact for low dimension data sets. Array model provides natural indexing
Hybrid OLAP (HOLAP) User flexibility Low level: relational High-level: array
52
![Page 53: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/53.jpg)
Warehouse Products
Computer Associates -- CA-Ingres
Hewlett-Packard -- Allbase/SQL
Informix -- Informix, Informix XPS
Microsoft -- SQL Server
Oracle -- Oracle 7, Oracle Parallel Server
Red Brick -- Red Brick Warehouse
SAS Institute -- SAS
Software AG -- ADABAS
Sybase -- SQL Server, IQ, MPP
53
![Page 54: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/54.jpg)
Data Warehouse Vendors
54
![Page 55: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/55.jpg)
Data Warehouse Vendors
55
![Page 56: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/56.jpg)
Review
What is a data warehouse?
What is data warehousing?
What is the difference between OLTP and data warehousing?
What does ETL stand for?
What is the meaning of Metadata?
What is the star schema?
What is the snowflake schema?
What is an OLAP cube?
What are the most common OLAP operations?
56
![Page 57: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/57.jpg)
Next Week’s Class Talk
Volunteers are required for next week’s class talk.
Topic: Business Intelligence
Length: 20 minutes plus question time
Suggested Points of Interest Aim & Scope
• Techniques involved
Market• Vendors & Products
Typical applications• Supermarkets, Airlines, Financial Institutes …
Prospect of employment• Major BI companies
The future of BI• Development trends
57
![Page 58: Principles of Data Warehousing](https://reader035.fdocuments.in/reader035/viewer/2022081520/5681687f550346895ddef49a/html5/thumbnails/58.jpg)
Project Option--- Data Warehousing
Aim To gain hand-on experiences on data warehousing. To get familiar with popular data warehousing software. To build up teamwork and interpersonal skills.
Deliverables Reports Oral Presentation or Poster
Due Reports must be submitted before Week 14. Oral presentations and posters are scheduled on Week 15.
Software PowerOLAP InstantOLAP Pentaho
58