Effective Data Modeling
-
Upload
jeff-block -
Category
Documents
-
view
1.457 -
download
0
description
Transcript of Effective Data Modeling
© 2001-2010 Neudesic, LLC. All rights reserved.1
ITA BI Roundtable
Dimensional Modeling: Organizing your Data for Analytics
Jeff Block, Managing Consultant
(847) 924-1317
2 © 2001-2010 Neudesic, LLC. All rights reserved.
Welcome to the
ITA
Business Intelligence
Roundtable
3 © 2001-2010 Neudesic, LLC. All rights reserved.
Jeff Block, Neudesic
BI Roundtable Chairman
Who am I?
4 © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?
Today’s Agenda
Brief Introduction
Who’s in the room?
Presentation:
Organizing your Data for Analytics
Discussion / Networking
Coming up Next Month
4
5 © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?
Today’s Agenda
Brief Introduction
Who’s in the room?
Presentation:
Organizing your Data for Analytics
Discussion / Networking
Coming up Next Month
5
6 © 2001-2010 Neudesic, LLC. All rights reserved.
What kind of session is this?
• 2nd Tuesday of every month; 8-10 AM
– Here at the ITA TechNexus unless there’s a good reason to change
venues
• Sometimes a presentation
– My ideas, your ideas, case studies, best practices, panel discussions,
new developments, etc
• Sometimes an outside speaker
– Love to have some of you step up to the plate
• Always discussion
– Collaboration is the whole point of this group
• Always networking
– Meet people who will be valuable connections
Why are we here?
6
7 © 2001-2010 Neudesic, LLC. All rights reserved.
Topics and Target Audience
• Business and technology leaders– Not going to spend much time deep in the technical weeds
• Those who want to – Learn from each other
– Collaborate on solutions
– Network
in the BI space
• ITA members and their friends and their friends
and …
Why are we here?
7
8 © 2001-2010 Neudesic, LLC. All rights reserved.
In ScopeWhy are we here?
8
• Business Intelligence– Vision and strategy
• Planning and implementing BI initiatives– High-level architecture
– Best practices / Anti-patterns
– Case studies
– Etc
• What about data warehousing?– It’s in! (part of BI, in my world)
9 © 2001-2010 Neudesic, LLC. All rights reserved.
What is Business Intelligence?
Business Intelligence is the art and science of turning
corporate data into practical, accessible, actionable
knowledge assets, and leveraging them to make
empirically-based strategic or operational decisions which
increase an organization’s capacity to fulfill its mission.
To this end, BI requires:
A disciplined, well-governed culture
A specialized, analytic engine
A well-designed data architecture
9
Introduction
10 © 2001-2010 Neudesic, LLC. All rights reserved.
Classic BI Architecture
BI Presentation Components
OLAP Services
Data Warehouse ETL
Data
Mart
Data
Mart
Data
Mart
Data
Mart
Source
Systems ETLSource
Systems
Why are we here?
Our focus is the stuff in this picture and the practices and
processes that get it there effectively.
10
11 © 2001-2010 Neudesic, LLC. All rights reserved.
Out of ScopeWhy are we here?
11
• Other random stuff– No matter how cool Aunt Ruth’s cat is, she’s out of scope
• Building the tech together
• Arguing over low-level details
• Generally, if we talk about– Project management / SDLC
– Architecture and design
– Business processes
– Etc
then it will be in the context of BI / DW / EDM
12 © 2001-2010 Neudesic, LLC. All rights reserved.
Some Quick Feedback
How does this line up with
your expectations?
Why are we here?
12
13 © 2001-2010 Neudesic, LLC. All rights reserved.
A Few Logistics
• Grab on the way in...– A nametag
– You too can have a spiffy nametag; just pre-register.
• Let me know you’re here– Toss a card in the fish bowl
– No spam policy
– No card? No problem. Sign the list.
• Join our LinkedIn group– http://www.linkedin.com/groups?gid=1801350
– Don’t worry, we’ll send you an invite
• Restrooms, etc…
Why are we here?
13
14 © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?
Today’s Agenda
Brief Introduction
Who’s in the room?
Presentation:
Organizing your Data for Analytics
Discussion / Networking
Coming up Next Month
14
15 © 2001-2010 Neudesic, LLC. All rights reserved.
Brief Introductions
Please share with the group…
• Name
• Company
• Role
• What you want to get
out of this session?
Who’s in the room?
15
16 © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?
Today’s Agenda
Brief Introduction
Who’s in the room?
Presentation:
Organizing your Data for Analytics
Discussion / Networking
Coming up Next Month
16
17 © 2001-2010 Neudesic, LLC. All rights reserved.
Where are you?
1 2 3 4 5
Think you’re talking
about Star Trek
Know enough
to be dangerousCould model
Aunt Ruth’s cat
When you talk about dimensional modeling, I …
18 © 2001-2010 Neudesic, LLC. All rights reserved.
What is Dimensional Modeling?
Dimensional modeling is the art and science of modeling
data for the purposes of fast, efficient and intuitive
retrieval (typically from a data warehouse) for use in
online analytic processing.
Why a different model?
19 © 2001-2010 Neudesic, LLC. All rights reserved.
What is Dimensional Modeling?
Dimensional modeling is the art and science of modeling
data for the purposes of fast, efficient and intuitive
retrieval (typically from a data warehouse) for use in
online analytic processing.
Why a different model?
20 © 2001-2010 Neudesic, LLC. All rights reserved.
What is Dimensional Modeling?
Dimensional modeling is the art and science of modeling
data for the purposes of fast, efficient and intuitive
retrieval (typically from a data warehouse) for use in
online analytic processing.
• Completely different data modeling approach
– Than most of us are used to
• Two strategic goals:
– Fast, efficient data retrieval
– Intuitive interface to the data
Why a different model?
21 © 2001-2010 Neudesic, LLC. All rights reserved.
Why a different model?
Different Goals
Storage of Historic Records
Predictability of Requirements
Why a different model?
22 © 2001-2010 Neudesic, LLC. All rights reserved.
Different Goals
• Transactional systems
– An effective interface between a business process and a user
– Effective execution of a single business transaction
• OLAP systems
– An effective interface between a corporate decision-maker and
analytic analysis data
– Effective analysis of a set of business transactions
• Note the absence of “efficient storage” goals. Why?
Why a different model?… A Different Model
23 © 2001-2010 Neudesic, LLC. All rights reserved.
Storage of Historic Records
• Transactional systems
– No need to know history
– Optimized for the current transaction
• OLAP systems
– Business should be able to arbitrarily define the longevity of
data
– Optimized for consistent historic and predictive analysis
• Why no history in operational systems?
Why a different model?… A Different Model
24 © 2001-2010 Neudesic, LLC. All rights reserved.
Predictability of Requirements
• Transactional systems
– Very predictable usage requirements
– Every interaction follows the same transactional process
• OLAP systems
– Very unpredictable usage requirements
– Ad-hoc / business-configured queries
– Every interaction potentially follows a completely different
pattern than the previous interaction
• Why are OLAP queries so unpredictable?
Why a different model?… A Different Model
25 © 2001-2010 Neudesic, LLC. All rights reserved.
How Data is Modeled
• The dimensional model stores data in “star schemas”
• Two core elements: “facts” and “dimensions”
• Facts
– Core data of a business event
– The “verb” in the sentence describing the event
– Also called a “measure”
• Dimensions
– Context in which the event (measurement) occurred
– The “nouns” in the sentence
Why a different model?
26 © 2001-2010 Neudesic, LLC. All rights reserved.
Two Kinds of “Facts”
• Measuring a business event
– A customer ordered a widget
– A new book was published
– A relationship was established
– A lead was converted
• Taking a snapshot of reality
– Inventory looks like this at this time
– Membership looks like this on this date
– Current workflow is at this stage at this time
Why a different model?
27 © 2001-2010 Neudesic, LLC. All rights reserved.
Seeing the Model in the Data
• An example of a business event
– “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on
Tuesday at 3:28PM”
• Implies a dimensional model with…
Why a different model?
You tell me
Huddle up, and list the facts and
dimensions in this event
28 © 2001-2010 Neudesic, LLC. All rights reserved.
Seeing the Model in the Data
• Implies a dimensional model with
– One fact
› “customer purchased items”
› Two lines written to fact table; one for each item purchased
– Several dimensions
› Customer “Sally”
› Inventory items “milk” and “eggs” with specific SKUs
› A particular “Wal-Mart” store with a specific identifier
› A particular clerk, identified as “Clerk 12”
› Date “Tuesday”
› Time “3:28PM”
Why a different model?
“Sally purchased milk and eggs from Clerk 12 at
Wal-Mart on Tuesday at 3:28PM”
29 © 2001-2010 Neudesic, LLC. All rights reserved.
How to Use the Model
• “Pivot” the context on the measurement taken
– Offers various perspectives on the data
• Aggregate many measures to achieve analytic report
• If aggregated…
– Hundreds, thousands, millions of times
• What questions could you ask these data?
Why a different model?
“Sally purchased milk and eggs from Clerk 12 at
Wal-Mart on Tuesday at 3:28PM”
30 © 2001-2010 Neudesic, LLC. All rights reserved.
How to Use the Model
• A few questions I thought of…
– In what regions of the country do we sell the most dairy
products in the first quarter?
– Which three clerks sold the most impulse items in each Super
Wal-Mart in the mid-west this year?
– What is the correlation between the sale of milk and eggs in
summer vs. winter months?
– Who are our most loyal customers?
– At what time of day do we typically not sell any dairy products?
– Does staying open later on the weekends result in more dairy
product sales?
Why a different model?
“Sally purchased milk and eggs from Clerk 12 at
Wal-Mart on Tuesday at 3:28PM”
31 © 2001-2010 Neudesic, LLC. All rights reserved.
How to Build the ModelAn Example
Customer
Purchased Item
Customers
Items
Times
Dates
Store
ClerksFact
Dimensions Dimensions
See why they call it a “star schema”?
“Sally purchased milk and eggs from Clerk 12 at
Wal-Mart on Tuesday at 3:28PM”
32 © 2001-2010 Neudesic, LLC. All rights reserved.
What’s an “Analytic Cube”?
• Purchase measure is the pivot point
• Joins 2 or more dimensions (context)
Why a different model?
Pro
duct
Date
Store
Purchase
33 © 2001-2010 Neudesic, LLC. All rights reserved.
What’s an “Analytic Cube”?
• Extrapolate to a cube
– Several measure sharing a set of dimensions
• Pivot cube on any point to get different analytic views of the data
• Really N-dimensional, but we mere mortals can’t visualize that
– So it’s a cube
Why a different model?
Pro
duct
Store
DatePurchase
34 © 2001-2010 Neudesic, LLC. All rights reserved.
Selecting Appropriate Grain for Facts
• The “grain” of a fact table is the most granular level of
information that can be retrieved from the table.
• Shoot for “Atomic” grain facts
– Irreducibly complex; cannot be subdivided
– Dimensionally unconstrained
› Rolls up in any and all possible ways
› BI requires cutting through details in precise ways
– Required for drilling into reports
› One of the core strengths of BI
– Required for ad-hoc querying
– Can always create other fact tables or business views with
aggregations
Why a different model?
35 © 2001-2010 Neudesic, LLC. All rights reserved.
Kimball’s Dimensional Design Process
• Step 1: Select business process to model
– Natural business activity performed
– Not a department or business function
• Step 2: Declare grain of the business process
– Level of detail associated with fact measurement
– Define exactly what a fact table row represents
– Atomic data is typically best
• Step 3: Choose dimensions applying to each fact table row
– Context in which we’re taking measurements
– Answer: “How do businesspeople describe the data that results
from the business process?”
– List dimensions, then all attributes per dimension
Why a different model?
36 © 2001-2010 Neudesic, LLC. All rights reserved.
Kimball’s Dimensional Design Process
• Step 4: Identify numeric measure to populate fact tables
– Numeric fact info which will populate the rows of the fact table
– Answer: “What are we measuring?”
– Measure only in the determined grain
– Different grain requires different fact table
Why a different model?
37 © 2001-2010 Neudesic, LLC. All rights reserved.
Dimensional Conformity
• The power of the enterprise data warehouse is making
a “single source of the truth” available to the business
– Only possible with conformed dimensions
– Kimball’s “enterprise bus” model favors this
• Dimensions are nouns
– “Product”, “Customer”, “Store”, “Person”, etc
– If more than one definition of a noun, sentences start to have
conflicting meanings
• Only one definition of a dimension means it’s
“conformed”
Why a different model?
38 © 2001-2010 Neudesic, LLC. All rights reserved.
Beautiful if you have it…
• Cross-functional view of data
• Whole organization working in concert
– Trend analysis
– Predictive analysis
– Drilling down into the true root cause of problems
– Accurate and complete financial pictures
Why a different model?… Dimensional Conformity
39 © 2001-2010 Neudesic, LLC. All rights reserved.
Anarchy if you don’t…
• Missed opportunities from silo’d data
• Nearly redundant departmental databases
– Nearly redundant data development
– Nearly redundant administration
– Nearly redundant storage
– Nearly redundant system development
– A lot of wasted time, energy and money
• Even more waste comes from trying to reconcile slightly
different versions of the truth
Why a different model?… Dimensional Conformity
40 © 2001-2010 Neudesic, LLC. All rights reserved.
But you can Restore Order
• Three requirements
1. Political clout
2. Financial means
3. Willingness / ability to challenge the status quo
• Pick a silo where you can drive a stake in the ground
– I call it “bedrock data”
• Expand out from there
– Analyze and graft other silos onto the bedrock
– DO NOT start ANY initiative that creates a new center of data
Why a different model?… Dimensional Conformity
41 © 2001-2010 Neudesic, LLC. All rights reserved.
Other (Advanced?) Topics
• Snowflakes
• Slowly Changing Dimensions
• Denormalized Dimensions
• Factless Fact Tables
• Degenerate Dimensions
• Master Data Management
• Much more
Interested in a follow-up?
Why a different model?
42 © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?
Today’s Agenda
Brief Introduction
Who’s in the room?
Presentation:
Organizing your Data for Analytics
Discussion / Networking
Coming up Next Month
42
43 © 2001-2010 Neudesic, LLC. All rights reserved.
Discussion Time
44 © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?
Today’s Agenda
Brief Introduction
Who’s in the room?
Presentation:
Organizing your Data for Analytics
Discussion / Networking
Coming up Next Month
44
45 © 2001-2010 Neudesic, LLC. All rights reserved.
Coming Up…
• March 9, 2010; 8-10 AM at the ITA
– Topic: What Thomas Edison would do with
your data
– Speaker: Sarah Miller Caldicott
› Great grandniece of Thomas Edison
› Co-author: “Innovate Like Edison”
› Founder: The Power Patterns of Innovation
• April 13, 2010; 8-10:30AM at the ITA
– Topic: Grudge Match II – Another Smackdown
– Proposed featured BI product vendors:
› Microsoft
› Oracle
› Info Bright
› MicroStrategy
Why a different model?