© 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools...

38
© 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2

Transcript of © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools...

Page 1: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Administrivia – HW #2

Page 2: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Online Analytical Processing (OLAP)

BI Tools and Techniques

Robert Monroe

April 8, 2008

Page 3: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Key Take Aways

Page 4: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Core OLAP Concepts

Page 5: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

What Are OLAP Tools?

• OLAP tools provide a mechanism for interactive analysis and exploration of dimensional data

– Interactive: users need to be able to easily specify queries

– Analysis: it should be possible to perform (and reuse) complex analyses of the dimensional data

– Exploration: answering one question with an OLAP tool frequently raises numerous subsequent questions

• A good OLAP tool allows the user to quickly pose follow-on queries

– Dimensional: OLAP tools operate on dimensional data – data structured as facts and dimensions

Page 6: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

OLAP’s Role In Decision Making

Source: O’Brien, Management Information Systems, 6th ed.

OLAP excels at exploring complex, structured questions

OLAP Sweet-Spot

Page 7: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Quick OLAP Tools Demo

• Contour Components OLAP cube browser– Open http://olaplib.contourcomponents.com/ in IE 6.0 or higher– Ok the installation of any ActiveX controls that the site requests– Use the Samples > Government > Regional Employee Turnover menu in

the upper left of the screen to open up sample OLAP cube.

• Demo requires IE 6.0 or later and ActiveX install – Installation for class is optional

• For first demo we will browse regional emloyee turnover data

Page 8: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Why Not Just Write SQL Queries?

• Performance• Complexity• Exploration• Presentation• Difficulty in dealing with hierarchies• Difficult or impossible to specify some desired queries

Page 9: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Why Not Just Use Spreadsheets?

• Complexity (with > 2 dimensions)• Presentation is tied to representation• Does not scale to large data sets or many dimensions

– Storage and representation is ill-suited to the task

• Inability to deal with hierarchies

Page 10: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

OLAP’s Place In A Business Intelligence Solution

Reconcile Data

Derive

Data

OLAPCube

OLAP

Tools

Analyze

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

Page 11: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Dimensional Modeling with HyperCubes:Basic Concepts

Page 12: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Representing Dimensional Databases as Cubes

• OLAP tools represent dimensional data as cubes– Cubes are also sometimes referred to as hypercubes

• Dimension tables are represented as cube dimensions

• Facts are represented using measures– Measures can be thought of as the values stored in individual

cells of the cube

– Measures consist of two parts:• A numerical value that represents the basic fact

• A formula for combining multiple measures into a single measure

Page 13: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Quick Review: Dimensional Modeling Example

Fact table provides statistics for sales broken down by product, period and store dimensions

Dimension tables provides details on stores, products, and time periods

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

Page 14: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Quick Review: Dimensional Example With Data

Product (dimension) Period (dimension)

Store (dimension)

Sales

(fact)

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

Page 15: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Multiple Fact Tables

• It is frequently useful to store more than one type of fact in a single multidimensional database (star schema)

• This can be handled by using multiple fact tables that share dimensions

• Example: modeling products sold and products purchased

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

Page 16: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Factless Fact Tables – Tracking Events

• “Factless” fact tables store only foreign keys, no facts

• Factless fact tables allow the tracking of what types of events happened, and under what circumstances they happened

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

Page 17: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Conformed Dimensions

• When dimensions are shared across multiple fact tables they must be conformed dimensions

• Conformed dimensions– One or more dimension tables associated with two or more

fact tables for which the dimension tables have the same business meaning and primary key with each fact table

• Conformed dimensions allow users to:– Query across multiple fact tables

– Improve consistency of meaning and structure for derived and retrieved information

Page 18: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Tabular Representation of Measures and Dimensions

• Simple example of viewing OLAP data in a grid:– Row headings (Store) represent dimension members

– Columns represent different measures

Store Sales Data for 2004

Store Gross Sales Quota Profits Sales vs. Quota

Chicago $3,250,000 $2,750,000 $624,352 + $500,000

New York $4,500,000 $3,550,000 $100,000 + $950,000

Pittsburgh $1,600,000 $1,700,000 $250,000 - $100,000

Measures

Dimension

Page 19: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Tabular Representation of Measures and Dimensions

• Example 2: Store sales by year and store location– Column and row headings represent dimension values in this case

– Cells represent measures, Name of table describes measure

Store Sales Data 2004-2007

Store 2004 2005 2006 2007

Chicago $3,250,000 $3,500,000 $3,000,000 $3,900,000

New York $4,500,000 $4,350,000 $5,100,000 $5,450,000

Pittsburgh $1,600,000 $1,700,000 $1,800,000 $1,650,000

Dimensions

Measures

Page 20: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Cube Representation of Measures and Dimensions

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

Page 21: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Dimension Hierarchies

• Dimension tables are represented as cube dimensions– Cube dimensions use levels to represent hierarchies

– Each sub-level subdivides the parent level with finer granularity

• Dimensions can be of fixed or variable height (jagged)

• Examples – Dimension: Time Period

• Levels – Year :: Quarter :: Month :: Week :: Day

– Dimension: Organization• Levels – Company :: Division :: Department :: Employee

Page 22: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Measures

• Measures represent the interesting data at the intersection of different dimensions

• There is a space for a measure at every intersection of every level of every dimension

– Base facts are stored in the intersections of lowest-level dimensions (either simple or calculated measures)

– Aggregate or computed values are stored at the intersections of where all of the dimensions are not at the lowest level (aggregate values must be calculated measures)

Page 23: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Three Categories Of Measures

• Additive measures can be meaningfully combined along any dimensions– Example: total sales by product, location, or time

• Semi-additive measures cannot be combined along one or more dimensions– Example: summing inventory levels across time

• Non-additive measures cannot be combined along any dimensions– Example: weighted averages without weight information

• Exercise: – Identify three measures of interest for a cube that tracks sales data– Be sure to identify numeric value tracked and aggregation function

Definition source: Pedersen and Jensen, Multidimensional Database Technology, IEEE Computer 12/01

Page 24: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Why OLAP Performs So Well

• Pre-computation of aggregates, and other values at cube-building time enable very rapid responses to many common queries

• Ability to specify other formulas/values to precompute on cube build

• Use of standardized structure and dimensional model allows query engine to make many assumptions about how to best answer queries and take advantage of pre-computed values

Page 25: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Dimensions Examples

• What dimensions are available in the regional employee turnover example?– Are there any important dimensions missing that you might want

to use for an analysis if you were a governmental official trying to improve the employment outlook in your region?

• The worldwide population cube has an example of a hierarchical dimension – Which one is hierarchical?– Is it a fixed or jagged dimension?– What are the measures in this cube?

Page 26: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Analytics

• Analytics are specific analyses that can be performed on an OLAP cube– Simple pre-defined analytics (sums, counts, percentages)– Complex pre-canned analytics defined as part of the cube

model/build– Ad-hoc exploration

• Examples:– Actual sales vs. quota by sales region – Supplier count by commodity category by division– Deviation from contracted pricing by supplier, commodity

category, and division over the previous 3 years – Examples of analytics related to sourcing or procurement?

Page 27: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Analytics Examples

• Revenue cube analytics• Automobile traffic analytics• Marketing dynamics cube (multiple slices preset)

Page 28: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Drilling Down

The drilling down operation analyzes the data presently displayed in greater detail.

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

Page 29: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Slicing

• The slicing operation selects specific values for one or more dimensions of a cube and renders measures for those dimensions in a two-dimensional table

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

Page 30: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Filtering

• Filtering reduces the elements included in a calculation

• Filtering can cross multiple slices

• Example: filter previous results to only show February, April, May

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

Page 31: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

In-Class Exercise

• Open the Contour Cubes Automobile Traffic sample

• Which intersection and day in London has the most overutilization of the roads?

• Which intersection has the worst overutilization of roads across all of the days?

• Which intersection has the highest overall hourly traffic flow?

Page 32: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Pivoting Data

• OLAP tools generally let you pivot dimensions – This involves switching

which dimensions are displayed horizontally and which are displayed vertically

• This can be useful when exploring and trying to visualize data

Store Sales Data ‘97 – ‘00 ($ Millions)

Store 1997 1998 1999 2000

Chicago $3.25 $3.5 $3.0 $3.9

NY $4.5 $4.35 $5.1 $5.45

Pgh $1.6 $1.7 $1.8 $1.65

Annual Sales, By Store ‘97 – ‘00 ($ Millions)

Year Chicago NY PGH

1997 $3.25 $4.5 $1.6

1998 $3.5 $4.35 $1.7

1999 $3.0 $5.1 $1.8

2000 $3.9 $5.45 $1.65

Pivot

Page 33: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Modeling Hierarchies

• Dimension tables frequently model hierarchies• Example:

– Customers dimension stores data about your customers– You may sell to several divisions of a single company– You want to be able to analyze sales to the individual divisions and also

capture “rolled-up” values for the parent company

Divisions of ABC Automotive

Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

Page 34: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Modeling Hierarchies With Denormalized Tables (I)

• Hierarchical dimensions are frequently represented with denormalized tables

• Simplifies and speeds queries at the cost of introducing anomalies

• This example represents a ‘jagged’ or ‘arbitrary’ hierarchy

Customer_Dimension

Parent_Company Customer_Key Name Address Type

<null> C000001 ABC Automotive 100 1st St. Dealer

C000001 C000002 ABC Auto Sales 110 1st St. Sales

C000001 C000003 ABC Repair 130 1st St. Service

C000002 C000004 ABC Auto New Sales 110 1st St. Sales

C000002 C000005 ABC Auto Used Sales 110 1st St. Sales

<null> C000006 Bubba’s House O’ Cars 5432 Maple Ln Dealer

Page 35: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Modeling Hierarchies With Denormalized Tables (II)

• Similar example but with a well-defined hierarchy depth– Same number of levels for all entries in the dimension table

– Simpler structureThis approach requires a fixed height to hierarchy

– , CityID serves as primary key for the whole table

City_Geography_Dimension

CityID CityName StateID StateName TimeZone

45 Little Rock 2 Arkansas Central

263 Denver 15 Colorado Mountain

423 Aspen 15 Colorado Mountain

522 Pittsburgh 36 Pennsylvania Eastern

771 Philadelphia 36 Pennsylvania Eastern

Page 36: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Wrap Up

Page 37: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

Key Take Aways

Page 38: © 2007 Robert T. Monroe Carnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques Administrivia – HW #2 Homework #2 OLAP.

© 2007 Robert T. MonroeCarnegie Mellon University ©2006 - 2008 Robert T. Monroe 45-875 BI Tools and Techniques

7th Inning Stretch