Effective Data Modeling

46
© 2001-2010 Neudesic, LLC. All rights reserved. 1 ITA BI Roundtable Dimensional Modeling: Organizing your Data for Analytics Jeff Block, Managing Consultant [email protected] (847) 924-1317

description

A quick, practical guide to top issues, hot topics, and best practices in modeling your data for analytics.

Transcript of Effective Data Modeling

Page 1: Effective Data Modeling

© 2001-2010 Neudesic, LLC. All rights reserved.1

ITA BI Roundtable

Dimensional Modeling: Organizing your Data for Analytics

Jeff Block, Managing Consultant

[email protected]

(847) 924-1317

Page 2: Effective Data Modeling

2 © 2001-2010 Neudesic, LLC. All rights reserved.

Welcome to the

ITA

Business Intelligence

Roundtable

Page 3: Effective Data Modeling

3 © 2001-2010 Neudesic, LLC. All rights reserved.

Jeff Block, Neudesic

BI Roundtable Chairman

Who am I?

Page 4: Effective Data Modeling

4 © 2001-2010 Neudesic, LLC. All rights reserved.

What are we talking about?

Today’s Agenda

Brief Introduction

Who’s in the room?

Presentation:

Organizing your Data for Analytics

Discussion / Networking

Coming up Next Month

4

Page 5: Effective Data Modeling

5 © 2001-2010 Neudesic, LLC. All rights reserved.

What are we talking about?

Today’s Agenda

Brief Introduction

Who’s in the room?

Presentation:

Organizing your Data for Analytics

Discussion / Networking

Coming up Next Month

5

Page 6: Effective Data Modeling

6 © 2001-2010 Neudesic, LLC. All rights reserved.

What kind of session is this?

• 2nd Tuesday of every month; 8-10 AM

– Here at the ITA TechNexus unless there’s a good reason to change

venues

• Sometimes a presentation

– My ideas, your ideas, case studies, best practices, panel discussions,

new developments, etc

• Sometimes an outside speaker

– Love to have some of you step up to the plate

• Always discussion

– Collaboration is the whole point of this group

• Always networking

– Meet people who will be valuable connections

Why are we here?

6

Page 7: Effective Data Modeling

7 © 2001-2010 Neudesic, LLC. All rights reserved.

Topics and Target Audience

• Business and technology leaders– Not going to spend much time deep in the technical weeds

• Those who want to – Learn from each other

– Collaborate on solutions

– Network

in the BI space

• ITA members and their friends and their friends

and …

Why are we here?

7

Page 8: Effective Data Modeling

8 © 2001-2010 Neudesic, LLC. All rights reserved.

In ScopeWhy are we here?

8

• Business Intelligence– Vision and strategy

• Planning and implementing BI initiatives– High-level architecture

– Best practices / Anti-patterns

– Case studies

– Etc

• What about data warehousing?– It’s in! (part of BI, in my world)

Page 9: Effective Data Modeling

9 © 2001-2010 Neudesic, LLC. All rights reserved.

What is Business Intelligence?

Business Intelligence is the art and science of turning

corporate data into practical, accessible, actionable

knowledge assets, and leveraging them to make

empirically-based strategic or operational decisions which

increase an organization’s capacity to fulfill its mission.

To this end, BI requires:

A disciplined, well-governed culture

A specialized, analytic engine

A well-designed data architecture

9

Introduction

Page 10: Effective Data Modeling

10 © 2001-2010 Neudesic, LLC. All rights reserved.

Classic BI Architecture

BI Presentation Components

OLAP Services

Data Warehouse ETL

Data

Mart

Data

Mart

Data

Mart

Data

Mart

Source

Systems ETLSource

Systems

Why are we here?

Our focus is the stuff in this picture and the practices and

processes that get it there effectively.

10

Page 11: Effective Data Modeling

11 © 2001-2010 Neudesic, LLC. All rights reserved.

Out of ScopeWhy are we here?

11

• Other random stuff– No matter how cool Aunt Ruth’s cat is, she’s out of scope

• Building the tech together

• Arguing over low-level details

• Generally, if we talk about– Project management / SDLC

– Architecture and design

– Business processes

– Etc

then it will be in the context of BI / DW / EDM

Page 12: Effective Data Modeling

12 © 2001-2010 Neudesic, LLC. All rights reserved.

Some Quick Feedback

How does this line up with

your expectations?

Why are we here?

12

Page 13: Effective Data Modeling

13 © 2001-2010 Neudesic, LLC. All rights reserved.

A Few Logistics

• Grab on the way in...– A nametag

– You too can have a spiffy nametag; just pre-register.

• Let me know you’re here– Toss a card in the fish bowl

– No spam policy

– No card? No problem. Sign the list.

• Join our LinkedIn group– http://www.linkedin.com/groups?gid=1801350

– Don’t worry, we’ll send you an invite

• Restrooms, etc…

Why are we here?

13

Page 14: Effective Data Modeling

14 © 2001-2010 Neudesic, LLC. All rights reserved.

What are we talking about?

Today’s Agenda

Brief Introduction

Who’s in the room?

Presentation:

Organizing your Data for Analytics

Discussion / Networking

Coming up Next Month

14

Page 15: Effective Data Modeling

15 © 2001-2010 Neudesic, LLC. All rights reserved.

Brief Introductions

Please share with the group…

• Name

• Company

• Role

• What you want to get

out of this session?

Who’s in the room?

15

Page 16: Effective Data Modeling

16 © 2001-2010 Neudesic, LLC. All rights reserved.

What are we talking about?

Today’s Agenda

Brief Introduction

Who’s in the room?

Presentation:

Organizing your Data for Analytics

Discussion / Networking

Coming up Next Month

16

Page 17: Effective Data Modeling

17 © 2001-2010 Neudesic, LLC. All rights reserved.

Where are you?

1 2 3 4 5

Think you’re talking

about Star Trek

Know enough

to be dangerousCould model

Aunt Ruth’s cat

When you talk about dimensional modeling, I …

Page 18: Effective Data Modeling

18 © 2001-2010 Neudesic, LLC. All rights reserved.

What is Dimensional Modeling?

Dimensional modeling is the art and science of modeling

data for the purposes of fast, efficient and intuitive

retrieval (typically from a data warehouse) for use in

online analytic processing.

Why a different model?

Page 19: Effective Data Modeling

19 © 2001-2010 Neudesic, LLC. All rights reserved.

What is Dimensional Modeling?

Dimensional modeling is the art and science of modeling

data for the purposes of fast, efficient and intuitive

retrieval (typically from a data warehouse) for use in

online analytic processing.

Why a different model?

Page 20: Effective Data Modeling

20 © 2001-2010 Neudesic, LLC. All rights reserved.

What is Dimensional Modeling?

Dimensional modeling is the art and science of modeling

data for the purposes of fast, efficient and intuitive

retrieval (typically from a data warehouse) for use in

online analytic processing.

• Completely different data modeling approach

– Than most of us are used to

• Two strategic goals:

– Fast, efficient data retrieval

– Intuitive interface to the data

Why a different model?

Page 21: Effective Data Modeling

21 © 2001-2010 Neudesic, LLC. All rights reserved.

Why a different model?

Different Goals

Storage of Historic Records

Predictability of Requirements

Why a different model?

Page 22: Effective Data Modeling

22 © 2001-2010 Neudesic, LLC. All rights reserved.

Different Goals

• Transactional systems

– An effective interface between a business process and a user

– Effective execution of a single business transaction

• OLAP systems

– An effective interface between a corporate decision-maker and

analytic analysis data

– Effective analysis of a set of business transactions

• Note the absence of “efficient storage” goals. Why?

Why a different model?… A Different Model

Page 23: Effective Data Modeling

23 © 2001-2010 Neudesic, LLC. All rights reserved.

Storage of Historic Records

• Transactional systems

– No need to know history

– Optimized for the current transaction

• OLAP systems

– Business should be able to arbitrarily define the longevity of

data

– Optimized for consistent historic and predictive analysis

• Why no history in operational systems?

Why a different model?… A Different Model

Page 24: Effective Data Modeling

24 © 2001-2010 Neudesic, LLC. All rights reserved.

Predictability of Requirements

• Transactional systems

– Very predictable usage requirements

– Every interaction follows the same transactional process

• OLAP systems

– Very unpredictable usage requirements

– Ad-hoc / business-configured queries

– Every interaction potentially follows a completely different

pattern than the previous interaction

• Why are OLAP queries so unpredictable?

Why a different model?… A Different Model

Page 25: Effective Data Modeling

25 © 2001-2010 Neudesic, LLC. All rights reserved.

How Data is Modeled

• The dimensional model stores data in “star schemas”

• Two core elements: “facts” and “dimensions”

• Facts

– Core data of a business event

– The “verb” in the sentence describing the event

– Also called a “measure”

• Dimensions

– Context in which the event (measurement) occurred

– The “nouns” in the sentence

Why a different model?

Page 26: Effective Data Modeling

26 © 2001-2010 Neudesic, LLC. All rights reserved.

Two Kinds of “Facts”

• Measuring a business event

– A customer ordered a widget

– A new book was published

– A relationship was established

– A lead was converted

• Taking a snapshot of reality

– Inventory looks like this at this time

– Membership looks like this on this date

– Current workflow is at this stage at this time

Why a different model?

Page 27: Effective Data Modeling

27 © 2001-2010 Neudesic, LLC. All rights reserved.

Seeing the Model in the Data

• An example of a business event

– “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on

Tuesday at 3:28PM”

• Implies a dimensional model with…

Why a different model?

You tell me

Huddle up, and list the facts and

dimensions in this event

Page 28: Effective Data Modeling

28 © 2001-2010 Neudesic, LLC. All rights reserved.

Seeing the Model in the Data

• Implies a dimensional model with

– One fact

› “customer purchased items”

› Two lines written to fact table; one for each item purchased

– Several dimensions

› Customer “Sally”

› Inventory items “milk” and “eggs” with specific SKUs

› A particular “Wal-Mart” store with a specific identifier

› A particular clerk, identified as “Clerk 12”

› Date “Tuesday”

› Time “3:28PM”

Why a different model?

“Sally purchased milk and eggs from Clerk 12 at

Wal-Mart on Tuesday at 3:28PM”

Page 29: Effective Data Modeling

29 © 2001-2010 Neudesic, LLC. All rights reserved.

How to Use the Model

• “Pivot” the context on the measurement taken

– Offers various perspectives on the data

• Aggregate many measures to achieve analytic report

• If aggregated…

– Hundreds, thousands, millions of times

• What questions could you ask these data?

Why a different model?

“Sally purchased milk and eggs from Clerk 12 at

Wal-Mart on Tuesday at 3:28PM”

Page 30: Effective Data Modeling

30 © 2001-2010 Neudesic, LLC. All rights reserved.

How to Use the Model

• A few questions I thought of…

– In what regions of the country do we sell the most dairy

products in the first quarter?

– Which three clerks sold the most impulse items in each Super

Wal-Mart in the mid-west this year?

– What is the correlation between the sale of milk and eggs in

summer vs. winter months?

– Who are our most loyal customers?

– At what time of day do we typically not sell any dairy products?

– Does staying open later on the weekends result in more dairy

product sales?

Why a different model?

“Sally purchased milk and eggs from Clerk 12 at

Wal-Mart on Tuesday at 3:28PM”

Page 31: Effective Data Modeling

31 © 2001-2010 Neudesic, LLC. All rights reserved.

How to Build the ModelAn Example

Customer

Purchased Item

Customers

Items

Times

Dates

Store

ClerksFact

Dimensions Dimensions

See why they call it a “star schema”?

“Sally purchased milk and eggs from Clerk 12 at

Wal-Mart on Tuesday at 3:28PM”

Page 32: Effective Data Modeling

32 © 2001-2010 Neudesic, LLC. All rights reserved.

What’s an “Analytic Cube”?

• Purchase measure is the pivot point

• Joins 2 or more dimensions (context)

Why a different model?

Pro

duct

Date

Store

Purchase

Page 33: Effective Data Modeling

33 © 2001-2010 Neudesic, LLC. All rights reserved.

What’s an “Analytic Cube”?

• Extrapolate to a cube

– Several measure sharing a set of dimensions

• Pivot cube on any point to get different analytic views of the data

• Really N-dimensional, but we mere mortals can’t visualize that

– So it’s a cube

Why a different model?

Pro

duct

Store

DatePurchase

Page 34: Effective Data Modeling

34 © 2001-2010 Neudesic, LLC. All rights reserved.

Selecting Appropriate Grain for Facts

• The “grain” of a fact table is the most granular level of

information that can be retrieved from the table.

• Shoot for “Atomic” grain facts

– Irreducibly complex; cannot be subdivided

– Dimensionally unconstrained

› Rolls up in any and all possible ways

› BI requires cutting through details in precise ways

– Required for drilling into reports

› One of the core strengths of BI

– Required for ad-hoc querying

– Can always create other fact tables or business views with

aggregations

Why a different model?

Page 35: Effective Data Modeling

35 © 2001-2010 Neudesic, LLC. All rights reserved.

Kimball’s Dimensional Design Process

• Step 1: Select business process to model

– Natural business activity performed

– Not a department or business function

• Step 2: Declare grain of the business process

– Level of detail associated with fact measurement

– Define exactly what a fact table row represents

– Atomic data is typically best

• Step 3: Choose dimensions applying to each fact table row

– Context in which we’re taking measurements

– Answer: “How do businesspeople describe the data that results

from the business process?”

– List dimensions, then all attributes per dimension

Why a different model?

Page 36: Effective Data Modeling

36 © 2001-2010 Neudesic, LLC. All rights reserved.

Kimball’s Dimensional Design Process

• Step 4: Identify numeric measure to populate fact tables

– Numeric fact info which will populate the rows of the fact table

– Answer: “What are we measuring?”

– Measure only in the determined grain

– Different grain requires different fact table

Why a different model?

Page 37: Effective Data Modeling

37 © 2001-2010 Neudesic, LLC. All rights reserved.

Dimensional Conformity

• The power of the enterprise data warehouse is making

a “single source of the truth” available to the business

– Only possible with conformed dimensions

– Kimball’s “enterprise bus” model favors this

• Dimensions are nouns

– “Product”, “Customer”, “Store”, “Person”, etc

– If more than one definition of a noun, sentences start to have

conflicting meanings

• Only one definition of a dimension means it’s

“conformed”

Why a different model?

Page 38: Effective Data Modeling

38 © 2001-2010 Neudesic, LLC. All rights reserved.

Beautiful if you have it…

• Cross-functional view of data

• Whole organization working in concert

– Trend analysis

– Predictive analysis

– Drilling down into the true root cause of problems

– Accurate and complete financial pictures

Why a different model?… Dimensional Conformity

Page 39: Effective Data Modeling

39 © 2001-2010 Neudesic, LLC. All rights reserved.

Anarchy if you don’t…

• Missed opportunities from silo’d data

• Nearly redundant departmental databases

– Nearly redundant data development

– Nearly redundant administration

– Nearly redundant storage

– Nearly redundant system development

– A lot of wasted time, energy and money

• Even more waste comes from trying to reconcile slightly

different versions of the truth

Why a different model?… Dimensional Conformity

Page 40: Effective Data Modeling

40 © 2001-2010 Neudesic, LLC. All rights reserved.

But you can Restore Order

• Three requirements

1. Political clout

2. Financial means

3. Willingness / ability to challenge the status quo

• Pick a silo where you can drive a stake in the ground

– I call it “bedrock data”

• Expand out from there

– Analyze and graft other silos onto the bedrock

– DO NOT start ANY initiative that creates a new center of data

Why a different model?… Dimensional Conformity

Page 41: Effective Data Modeling

41 © 2001-2010 Neudesic, LLC. All rights reserved.

Other (Advanced?) Topics

• Snowflakes

• Slowly Changing Dimensions

• Denormalized Dimensions

• Factless Fact Tables

• Degenerate Dimensions

• Master Data Management

• Much more

Interested in a follow-up?

Why a different model?

Page 42: Effective Data Modeling

42 © 2001-2010 Neudesic, LLC. All rights reserved.

What are we talking about?

Today’s Agenda

Brief Introduction

Who’s in the room?

Presentation:

Organizing your Data for Analytics

Discussion / Networking

Coming up Next Month

42

Page 43: Effective Data Modeling

43 © 2001-2010 Neudesic, LLC. All rights reserved.

Discussion Time

Page 44: Effective Data Modeling

44 © 2001-2010 Neudesic, LLC. All rights reserved.

What are we talking about?

Today’s Agenda

Brief Introduction

Who’s in the room?

Presentation:

Organizing your Data for Analytics

Discussion / Networking

Coming up Next Month

44

Page 45: Effective Data Modeling

45 © 2001-2010 Neudesic, LLC. All rights reserved.

Coming Up…

• March 9, 2010; 8-10 AM at the ITA

– Topic: What Thomas Edison would do with

your data

– Speaker: Sarah Miller Caldicott

› Great grandniece of Thomas Edison

› Co-author: “Innovate Like Edison”

› Founder: The Power Patterns of Innovation

• April 13, 2010; 8-10:30AM at the ITA

– Topic: Grudge Match II – Another Smackdown

– Proposed featured BI product vendors:

› Microsoft

› Oracle

› Info Bright

› MicroStrategy

Why a different model?

Page 46: Effective Data Modeling