© 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools...
-
Upload
darleen-smith -
Category
Documents
-
view
216 -
download
0
Transcript of © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools...
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Administrivia – HW #2
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Online Analytical Processing (OLAP)
BI Tools and Techniques
Robert Monroe
April 8, 2008
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Key Take Aways
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Core OLAP Concepts
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
What Are OLAP Tools?
• OLAP tools provide a mechanism for interactive analysis and exploration of dimensional data
– Interactive: users need to be able to easily specify queries
– Analysis: it should be possible to perform (and reuse) complex analyses of the dimensional data
– Exploration: answering one question with an OLAP tool frequently raises numerous subsequent questions
• A good OLAP tool allows the user to quickly pose follow-on queries
– Dimensional: OLAP tools operate on dimensional data – data structured as facts and dimensions
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
OLAP’s Role In Decision Making
Source: O’Brien, Management Information Systems, 6th ed.
OLAP excels at exploring complex, structured questions
OLAP Sweet-Spot
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Quick OLAP Tools Demo
• Contour Components OLAP cube browser– Open http://olaplib.contourcomponents.com/ in IE 6.0 or higher– Ok the installation of any ActiveX controls that the site requests– Use the Samples > Government > Regional Employee Turnover menu in
the upper left of the screen to open up sample OLAP cube.
• Demo requires IE 6.0 or later and ActiveX install – Installation for class is optional
• For first demo we will browse regional emloyee turnover data
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Why Not Just Write SQL Queries?
• Performance• Complexity• Exploration• Presentation• Difficulty in dealing with hierarchies• Difficult or impossible to specify some desired queries
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Why Not Just Use Spreadsheets?
• Complexity (with > 2 dimensions)• Presentation is tied to representation• Does not scale to large data sets or many dimensions
– Storage and representation is ill-suited to the task
• Inability to deal with hierarchies
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
OLAP’s Place In A Business Intelligence Solution
Reconcile Data
Derive
Data
OLAPCube
OLAP
Tools
Analyze
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Dimensional Modeling with HyperCubes:Basic Concepts
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Representing Dimensional Databases as Cubes
• OLAP tools represent dimensional data as cubes– Cubes are also sometimes referred to as hypercubes
• Dimension tables are represented as cube dimensions
• Facts are represented using measures– Measures can be thought of as the values stored in individual
cells of the cube
– Measures consist of two parts:• A numerical value that represents the basic fact
• A formula for combining multiple measures into a single measure
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Quick Review: Dimensional Modeling Example
Fact table provides statistics for sales broken down by product, period and store dimensions
Dimension tables provides details on stores, products, and time periods
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Quick Review: Dimensional Example With Data
Product (dimension) Period (dimension)
Store (dimension)
Sales
(fact)
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Multiple Fact Tables
• It is frequently useful to store more than one type of fact in a single multidimensional database (star schema)
• This can be handled by using multiple fact tables that share dimensions
• Example: modeling products sold and products purchased
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Factless Fact Tables – Tracking Events
• “Factless” fact tables store only foreign keys, no facts
• Factless fact tables allow the tracking of what types of events happened, and under what circumstances they happened
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Conformed Dimensions
• When dimensions are shared across multiple fact tables they must be conformed dimensions
• Conformed dimensions– One or more dimension tables associated with two or more
fact tables for which the dimension tables have the same business meaning and primary key with each fact table
• Conformed dimensions allow users to:– Query across multiple fact tables
– Improve consistency of meaning and structure for derived and retrieved information
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Tabular Representation of Measures and Dimensions
• Simple example of viewing OLAP data in a grid:– Row headings (Store) represent dimension members
– Columns represent different measures
Store Sales Data for 2004
Store Gross Sales Quota Profits Sales vs. Quota
Chicago $3,250,000 $2,750,000 $624,352 + $500,000
New York $4,500,000 $3,550,000 $100,000 + $950,000
Pittsburgh $1,600,000 $1,700,000 $250,000 - $100,000
Measures
Dimension
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Tabular Representation of Measures and Dimensions
• Example 2: Store sales by year and store location– Column and row headings represent dimension values in this case
– Cells represent measures, Name of table describes measure
Store Sales Data 2004-2007
Store 2004 2005 2006 2007
Chicago $3,250,000 $3,500,000 $3,000,000 $3,900,000
New York $4,500,000 $4,350,000 $5,100,000 $5,450,000
Pittsburgh $1,600,000 $1,700,000 $1,800,000 $1,650,000
Dimensions
Measures
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Cube Representation of Measures and Dimensions
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Dimension Hierarchies
• Dimension tables are represented as cube dimensions– Cube dimensions use levels to represent hierarchies
– Each sub-level subdivides the parent level with finer granularity
• Dimensions can be of fixed or variable height (jagged)
• Examples – Dimension: Time Period
• Levels – Year :: Quarter :: Month :: Week :: Day
– Dimension: Organization• Levels – Company :: Division :: Department :: Employee
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Measures
• Measures represent the interesting data at the intersection of different dimensions
• There is a space for a measure at every intersection of every level of every dimension
– Base facts are stored in the intersections of lowest-level dimensions (either simple or calculated measures)
– Aggregate or computed values are stored at the intersections of where all of the dimensions are not at the lowest level (aggregate values must be calculated measures)
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Three Categories Of Measures
• Additive measures can be meaningfully combined along any dimensions– Example: total sales by product, location, or time
• Semi-additive measures cannot be combined along one or more dimensions– Example: summing inventory levels across time
• Non-additive measures cannot be combined along any dimensions– Example: weighted averages without weight information
• Exercise: – Identify three measures of interest for a cube that tracks sales data– Be sure to identify numeric value tracked and aggregation function
Definition source: Pedersen and Jensen, Multidimensional Database Technology, IEEE Computer 12/01
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Why OLAP Performs So Well
• Pre-computation of aggregates, and other values at cube-building time enable very rapid responses to many common queries
• Ability to specify other formulas/values to precompute on cube build
• Use of standardized structure and dimensional model allows query engine to make many assumptions about how to best answer queries and take advantage of pre-computed values
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Dimensions Examples
• What dimensions are available in the regional employee turnover example?– Are there any important dimensions missing that you might want
to use for an analysis if you were a governmental official trying to improve the employment outlook in your region?
• The worldwide population cube has an example of a hierarchical dimension – Which one is hierarchical?– Is it a fixed or jagged dimension?– What are the measures in this cube?
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Analytics
• Analytics are specific analyses that can be performed on an OLAP cube– Simple pre-defined analytics (sums, counts, percentages)– Complex pre-canned analytics defined as part of the cube
model/build– Ad-hoc exploration
• Examples:– Actual sales vs. quota by sales region – Supplier count by commodity category by division– Deviation from contracted pricing by supplier, commodity
category, and division over the previous 3 years – Examples of analytics related to sourcing or procurement?
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Analytics Examples
• Revenue cube analytics• Automobile traffic analytics• Marketing dynamics cube (multiple slices preset)
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Drilling Down
The drilling down operation analyzes the data presently displayed in greater detail.
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Slicing
• The slicing operation selects specific values for one or more dimensions of a cube and renders measures for those dimensions in a two-dimensional table
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Filtering
• Filtering reduces the elements included in a calculation
• Filtering can cross multiple slices
• Example: filter previous results to only show February, April, May
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
In-Class Exercise
• Open the Contour Cubes Automobile Traffic sample
• Which intersection and day in London has the most overutilization of the roads?
• Which intersection has the worst overutilization of roads across all of the days?
• Which intersection has the highest overall hourly traffic flow?
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Pivoting Data
• OLAP tools generally let you pivot dimensions – This involves switching
which dimensions are displayed horizontally and which are displayed vertically
• This can be useful when exploring and trying to visualize data
Store Sales Data ‘97 – ‘00 ($ Millions)
Store 1997 1998 1999 2000
Chicago $3.25 $3.5 $3.0 $3.9
NY $4.5 $4.35 $5.1 $5.45
Pgh $1.6 $1.7 $1.8 $1.65
Annual Sales, By Store ‘97 – ‘00 ($ Millions)
Year Chicago NY PGH
1997 $3.25 $4.5 $1.6
1998 $3.5 $4.35 $1.7
1999 $3.0 $5.1 $1.8
2000 $3.9 $5.45 $1.65
Pivot
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Modeling Hierarchies
• Dimension tables frequently model hierarchies• Example:
– Customers dimension stores data about your customers– You may sell to several divisions of a single company– You want to be able to analyze sales to the individual divisions and also
capture “rolled-up” values for the parent company
Divisions of ABC Automotive
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Modeling Hierarchies With Denormalized Tables (I)
• Hierarchical dimensions are frequently represented with denormalized tables
• Simplifies and speeds queries at the cost of introducing anomalies
• This example represents a ‘jagged’ or ‘arbitrary’ hierarchy
Customer_Dimension
Parent_Company Customer_Key Name Address Type
<null> C000001 ABC Automotive 100 1st St. Dealer
C000001 C000002 ABC Auto Sales 110 1st St. Sales
C000001 C000003 ABC Repair 130 1st St. Service
C000002 C000004 ABC Auto New Sales 110 1st St. Sales
C000002 C000005 ABC Auto Used Sales 110 1st St. Sales
<null> C000006 Bubba’s House O’ Cars 5432 Maple Ln Dealer
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Modeling Hierarchies With Denormalized Tables (II)
• Similar example but with a well-defined hierarchy depth– Same number of levels for all entries in the dimension table
– Simpler structureThis approach requires a fixed height to hierarchy
– , CityID serves as primary key for the whole table
City_Geography_Dimension
CityID CityName StateID StateName TimeZone
45 Little Rock 2 Arkansas Central
263 Denver 15 Colorado Mountain
423 Aspen 15 Colorado Mountain
522 Pittsburgh 36 Pennsylvania Eastern
771 Philadelphia 36 Pennsylvania Eastern
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Wrap Up
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
Key Take Aways
© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques
7th Inning Stretch