1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis...
-
Upload
rylan-bratcher -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis...
![Page 1: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/1.jpg)
1
Theory, Practice & Methodology of Relational Database
Design and ProgrammingCopyright © Ellis Cohen 2002-2006
Introduction toData Warehouse
DesignThese slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
For more information on how you may use them, please see http://www.openlineconsult.com/db
![Page 2: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/2.jpg)
© Ellis Cohen, 2003-2006 2
Topics
OverviewStar Schema:
Fact & Dimension TablesThe Star Schema &
DenormalizationThe Data CubeETL: Extraction,
Transformation & Loading
![Page 3: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/3.jpg)
© Ellis Cohen, 2003-2006 3
Overview
![Page 4: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/4.jpg)
© Ellis Cohen, 2003-2006 4
Data Warehousing & Data Mining
Data WarehousingTechniques for representing & querying
large amounts of relatively static dataPotentially stored in
Multi-Dimensional DatabasesOn-line Analysis & Decision Support
Data MiningAutomated analysis: Discovering
(potentially) unexpected patterns in large amounts of data
![Page 5: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/5.jpg)
© Ellis Cohen, 2003-2006 5
Operational vs Analytical DBs
Operational DatabaseData needed and updated constantly to directly
support business operationsFocus on OLTP (on-line transaction processing):
Transactional access & modification of relatively small # of data points at a time
Analytical Database:Data Warehouse & Data MartCopious amounts of relatively static data, culled
& integrated across enterprise, cleansed & summarized, maintained historically, used for decision support and business intelligence (BI)
Focus on OLAP (on-line analytical processing): Querying large amounts of data, scheduled modifications
![Page 6: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/6.jpg)
© Ellis Cohen, 2003-2006 6
Operational vs Analytical DBs
Operational Warehouse
Usage Transactional(OLTP)
Analytical(OLAP)
Organized for Modifications Queries
Modifications Continual Periodic
Queries Narrow-scopeLow-complexity
Broad-scopeHigh-complexity
Database Relational Relational/Dimensional
Data NormalizedDenormalizedAggregated &
Derived
![Page 7: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/7.jpg)
© Ellis Cohen, 2003-2006 7
Central Data Warehouse
(from Oracle 9i Data Warehousing Guide)
![Page 8: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/8.jpg)
© Ellis Cohen, 2003-2006 8
Warehouse Questions
How many red Bally shoes did we sell by region in the third quarter of each of the last 5 years?
What are the top 25 selling products by category and region for this past quarter?
What percent of the market do we own for each product we make?
Which of our customer's zipcodes were responsible for the top 10% of total sales over the last year.
![Page 9: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/9.jpg)
© Ellis Cohen, 2003-2006 9
Star Schema:Fact & Dimension
Tables
![Page 10: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/10.jpg)
© Ellis Cohen, 2003-2006 10
Star Schema
Stores (Dimension)
DailySales (Fact)
storidprodiddatepriceunits
storid…
Products (Dimension)
prodid…
Measures
A Star Schema has a central fact table, with a composite primary key, which references multiple Dimension tables
what each fact measures
Data Warehousesare organized usingStar Schema models
foreign key
![Page 11: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/11.jpg)
© Ellis Cohen, 2003-2006 11
Subjects (Facts) & Dimensions
Instead of thinking about entities & relationships, design a data warehouse by thinking about
Subjects (represented by fact tables)
Sales, Distribution, Purchases
Dimensions (represented by dimension tables)
How to uniquely identify the facts about each subject– Sales: Product, Stores, Dates
(maybe also Employee, Customer: depends what you want to analyze)
– Distribution: Warehouses, Products, Stores, Dates (maybe Employees & Trucks)
– Purchases: Products, Vendors, Dates (maybe also Employees)
![Page 12: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/12.jpg)
© Ellis Cohen, 2003-2006 12
Fact & Dimension Tables
Fact TablesComposite primary key
• identify dimensions• uniquely identify each fact (or measurement)
Additional attributes: measures• what is measured about each fact
Dimension TablesPrimary key
Surrogate key uniquely identifies each dimension value
Additional attributesProperties of each dimension value
![Page 13: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/13.jpg)
© Ellis Cohen, 2003-2006 13
Dimensions & Granularity
Dimensions have different levels of granularity
Stores
Regions
Districts
Products
SubCategories
ProductTypes
Categories
Manufacturers
![Page 14: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/14.jpg)
© Ellis Cohen, 2003-2006 14
Snowflake Schema(with Normalized Dimensions)
Stores (Dimension) DailySales (Fact)storidprodiddatepriceunits
storidstornamcitystatedistid
Products (Dimension)
prodidcolorsizeprodtyp
Districtsdistiddistnamdistarearegid
Regionsregidregnam
ProductTypes
prodtypprodnamprodescrsubcatidmanfid
SubCategories
subcatidsubnamsubdescrcatid
Categories
catidcatnamcatdescr
Manufacturers
manfidmanfnam
![Page 15: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/15.jpg)
© Ellis Cohen, 2003-2006 15
Typical Warehouse Query
How many red Bally shoes did we sell in each region in 2002?
SELECT r.regnam as region, sum(f.units) as sumunitsFROM DailySales f NATURAL JOIN Stores NATURAL JOIN Districts NATURAL JOIN Regions r NATURAL JOIN Products p NATURAL JOIN ProductTypes NATURAL JOIN SubCategorie s NATURAL JOIN Manufacturers mWHERE to_char(f.date,'YYYY') = '2002' AND p.color = 'red' AND m.manfnam = 'Bally' AND s.subnam = 'Shoe'GROUP BY r.regnam
![Page 16: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/16.jpg)
© Ellis Cohen, 2003-2006 16
The Star Schema & Denormalization
![Page 17: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/17.jpg)
© Ellis Cohen, 2003-2006 17
Snowflake Schema is Normalized
Snowflake Schema has normalized dimension tables
• Each dimension is represented by multiple sub-dimension tables at different levels of granularity (Product, ProductType, Category, etc.)
• Each sub-dimension table has attributes appropriate to the level of granularity– Product: color, size
– ProductType: prodnam, prodescr
– etc.
![Page 18: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/18.jpg)
© Ellis Cohen, 2003-2006 18
Denormalization
Products (Dimension)
prodidcolorsizeprodtypprodnamprodescrmanfidmanfnamsubcatidsubnamsubdescrcatidcatnamcatdescr
Products (Dimension)
prodidcolorsizeprodtyp
ProductTypes
prodtypprodnamprodescrsubcatidmanfid
SubCategories
subcatidsubnamsubdescrcatid
Categories
catidcatnamcatdescr
Manufacturers
manfidmanfnam
Why is there redundancy
here?
![Page 19: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/19.jpg)
© Ellis Cohen, 2003-2006 19
Star Schema is Denormalized
The Star Schema has denormalized dimension tables
• Each dimension by joining together the sub-dimension table to form a single dimension table
• The dimension table has attributes at different levels of granularity
• The dimension tables contain lots of redundancy, but queries use far fewer joins
• Does not dramatically impact space: dimension tables usually < 1% size of fact table (but some descriptions may need to be stored separately)
![Page 20: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/20.jpg)
© Ellis Cohen, 2003-2006 20
Star Schema(Fully Denormalized Dimensions)
Stores (Dimension)
DailySales (Fact)
storidprodiddatepriceunits
storidstornamcitystatedistiddistnamdistarearegidregnam
Products (Dimension)
prodidcolorsizeprodtypprodnamprodescrmanfidmanfnamsubcatidsubnamsubdescrcatidcatnamcatdescrMaybe catdescr not
included here if it is a GIF or a 4000 byte
description
Why should this be
replaced by a dateid?
![Page 21: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/21.jpg)
© Ellis Cohen, 2003-2006 21
Query with Denormalized Schema
How many red Bally shoes did we sell in each region in 2002?
SELECT s.regnam as region, sum(f.units) as sumunitsFROM DailySales f NATURAL JOIN Stores s NATURAL JOIN Products p WHERE to_char(f.date,'YYYY') = '2002' AND p.color = 'red' AND p.manfnam = 'Bally' AND p.subnam = 'Shoe'GROUP BY s.regnam Costly
![Page 22: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/22.jpg)
© Ellis Cohen, 2003-2006 22
Typical Date Dimension Attributes
Requires Month + Year to identify a month within a year.Might want to add a single MonthYr field to represent the pair
Field Example Value
Year 2005
Month Feb
Quarter 1
DayOfMonth 12
DayOfYear 43
WeekOfYear 7
DayOfWeek Sat
Note: Quarter is less granular than MonthAlso, DayOfYear, WeekOfYear & DayOfWeek can be derived form the other fields
It is common and almost always more efficient to treat Dates as a dimension with a number of attributes
![Page 23: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/23.jpg)
© Ellis Cohen, 2003-2006 23
Extended Date Dimension Hierarchy
Date (e.g. Feb 12, 2005)
DayOfWeek(e.g. Sat)
WeekYr(e.g. 2005Wk7)
MonthYr(e.g. Feb2005)
QuarterYr(e.g. 2005Q1)
Year(e.g 2005)
Quarter(e.g. 1)
Month(e.g. Feb)
WeekOfYear(e.g. 7)
DayOfYear(e.g. 43)
DayOfMonth(e.g. 12)
![Page 24: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/24.jpg)
© Ellis Cohen, 2003-2006 24
Star Schema with Date Dimension
Stores (Dimension)DailySales (Fact)
storidprodiddateidpriceunits
storidstornamcitystatedistiddistnamdistarearegidregnam
Products (Dimension)prodidcolorsizeprodtypprodnamprodescrmanfidmanfnamsubcatidsubnamsubdescrcatidcatnamcatdescr
Dates(Dimension)
dateiddatedayofweekdayofmonthdayofyearweekyrweekofyearmonthyrmonthquarteryrquarteryear
In general, represent dates by a Dates dimension table
![Page 25: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/25.jpg)
© Ellis Cohen, 2003-2006 25
Query using Dates DimensionHow many red Bally shoes did we sell
in each region in 2002?SELECT s.regnam as region,
sum(f.units) as sumunitsFROM DailySales f NATURAL JOIN Stores s NATURAL JOIN Products p NATURAL JOIN Dates dWHERE d.year = 2002 AND p.color = 'red' AND p.manfnam = 'Bally' AND p.subnam = 'Shoe'GROUP BY s.regnam
Needs an extra join, but simpler query, Executes faster if Dates is indexed by year
![Page 26: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/26.jpg)
© Ellis Cohen, 2003-2006 26
The Data Cube
![Page 27: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/27.jpg)
© Ellis Cohen, 2003-2006 27
Data Cube Representation
Products dimension
Stores dimensio
n
Dates dimension
Sales of Beanie Babies in
Pittsburgh Store Today
Sales of Beanie Babies in Pittsburgh
Store Yesterday
All Sales(of all products
over time) in NYC Store
Pgh
NYC
Sales Cube
![Page 28: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/28.jpg)
© Ellis Cohen, 2003-2006 28
Data Cube Characteristics
Each axis represents a dimension
– Elements along axis are at lowest granularity for that dimension
Measures are the data within the cells at intersections of the cube
– Information about the topic of the cube
– e.g. units & price for each sales fact (i.e. sales in a store of a product on a date)
![Page 29: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/29.jpg)
© Ellis Cohen, 2003-2006 29
Data Cube ViewsSlice
View data relative to a point in one or more dimensions
View sales today (for each store & each product category)
View Bally shoe sales at the NYC store (for each date)
DiceView data relative to (sets of) ranges in one or
more dimensionsView sales for the last 4 days (for each store &
each product category)View sales for each type of shoes at all the NY
and NJ stores for each of the last 10 quarters
![Page 30: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/30.jpg)
© Ellis Cohen, 2003-2006 30
MDDB: MultiDimensional DataBase
Knows about Fact & Dimension TablesUses direct (n dimensional) hypercube
representation to provide fast access to fact elements in query
Supports sparse representations– The Pittsburgh store doesn't sell lingerie– The Cape Cod store is not open in the winter– Baked Beanie Babies are only sold in the NE
regionUses specialized query language
e.g. MDX (used by Microsoft OLAP Server)w basic data types: cube, slice, dice
![Page 31: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/31.jpg)
© Ellis Cohen, 2003-2006 31
ETL:Extraction,
Transformation & Loading
![Page 32: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/32.jpg)
© Ellis Cohen, 2003-2006 32
ETL: Extraction, Transformation & Loading
80% of total cost of building warehouse
Extraction Loading
Transformation
![Page 33: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/33.jpg)
© Ellis Cohen, 2003-2006 33
ExtractionSources
Multiple DB'sFlat FilesExternal Data Sources
• e.g. Census, Geographic, Weather, Financial, Unemployment Data
• Standard DB/Spreadsheet format or semi-structured data from the web
FrequencyPeriodic (hourly, daily, weekly, …)Triggered
• Single event• #, sequence, pattern of events
MechanismsSnapshots / Materialized Views / ReplicationDatabase TriggersProcess LogsQuery Sources (full vs incremental)
![Page 34: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/34.jpg)
© Ellis Cohen, 2003-2006 34
TransformationCleaning
ScrubbingFilteringConformance
IntegrationRenamingFusion & MergingDetermine Surrogate KeysTimestampingSummarization
Schema OrganizationDimension TablesPre-Aggregation via Materialized Views Derivation
![Page 35: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/35.jpg)
© Ellis Cohen, 2003-2006 35
(Transformation) Cleaning
ScrubbingUse domain-specific knowledgee.g. SS#, phone-number, zipcode
FilteringCheck for inconsistent dataUse data validation rules
ConformanceMap similarly typed data to standard
representation Convert
units (inch => cm, $ => euro)scale (mm => cm)formats (string => integer, string
with/wo $)
![Page 36: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/36.jpg)
© Ellis Cohen, 2003-2006 36
(Transformation) IntegrationRenaming
Resolve name conflictsFusion - e.g. merge
– properties in city db– properties in developer lists
Determine Surrogate KeysDo not use keys from operational data as
primary key in warehouse dataTimestamping
Add timestamps to fact data where missing to enable historical queries
Reorganization & EvolutionSupport Data Reorganization & Schema
EvolutionSummarization
Summarize original operational data and combine into less detailed tables
![Page 37: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/37.jpg)
© Ellis Cohen, 2003-2006 37
Integration (Data Reorganization)What do we do when attributes change?
Suppose districts are reorganized and a store is now part of a different district
Consistently changing mapping of store to district– Allows new and old data to be compared
reasonably by district– But causes incorrect comparisons by district
among older data alone
Solutions1. Keep fields for both old and new mapping -- in
fact, potentially a separate field for each reorganization
2. Add effective date to store dimension.Have multiple rows for same store - each with different effective date
![Page 38: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/38.jpg)
© Ellis Cohen, 2003-2006 38
(Integration) Summarization
DailySales (Fact)storidprodiddatepriceunitsCustomerTransaction
transidcustidempidposidtime
ItemPurchasetransidlinenoprodidpriceunits
PointOfSaleTerminals
posidpostypstoridloc
Might build different fact tables for different purposes:
e.g. ones involving Customersones involving Store Locations
TradeoffSmaller Fact Tables vs.Missed Relationships
![Page 39: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/39.jpg)
© Ellis Cohen, 2003-2006 39
Loading
Alternatives– Incremental vs Full Refresh:
most data is incrementally added to the warehouse– Off-line vs on-line– Frequency
• Nightly• Weekly• Monthly
– All-at-once vs StagedWhat indices to create or drop?What statistics to collect (& use)?
![Page 40: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/40.jpg)
© Ellis Cohen, 2003-2006 40
Constellation SchemaData warehouses often are designed as
constellations• Multiple fact tables• Shared/related dimension tables
Examples– Sales: store, product, date– Distribution: distributor, store, product,
carrier, period– Advertising: store, medium, product, period
Query across same or related dimensions– Compare advertising and sales by store
within various periods
![Page 41: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/41.jpg)
© Ellis Cohen, 2003-2006 41
Data Marts
Store different fact tables (or different groups of fact tables) in separate data marts
![Page 42: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/42.jpg)
© Ellis Cohen, 2003-2006 42
Data Mart Architectures
Subset of Data WarehouseMeets needs of subgroup of users
• Top-down: – Extracted from Data Warehouse– Problem: early availability
• Bottom-up:– Built directly from staging area– Can be combined to form warehouse– Problem: Conformance.
ETL tool must provide metadata
• Hybrid:– Some data marts built directly from staging area– Others extracted from Data Warehouse
![Page 43: 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2006 Introduction to Data Warehouse Design.](https://reader036.fdocuments.in/reader036/viewer/2022070306/551803f6550346c6568b52dc/html5/thumbnails/43.jpg)
© Ellis Cohen, 2003-2006 43
Metadata Management
Identify & define each attribute– Source(s)– Transformation(s) applied– How aggregated– Description of what it represents– Relationships to other attributes– History