Implementing an Enterprise DW-BI Solutiondownload.101com.com/pub/TDWI/Files/2007.12.11 TDWI...
Transcript of Implementing an Enterprise DW-BI Solutiondownload.101com.com/pub/TDWI/Files/2007.12.11 TDWI...
© 2005-2007 Kimball Group. All rights reserved.
Implementing a Scaleable Enterprise
DW/BI Solution
Bob BeckerDecember 11, 2007 TDWI Minneapolis
© 2005-2007 Kimball Group. All rights reserved. 2
Background and Acknowledgments
Session materials adapted from...
The Microsoft Data Warehouse ToolkitJ. Mundy, W. Thornthwaite (Wiley 2006)
The Data Warehouse Lifecycle ToolkitR. Kimball, L. Reeves, M. Ross, W. Thornthwaite (Wiley 1998)
The Data Warehouse Toolkit, 2nd Ed.R. Kimball, M. Ross (Wiley 2002)
Kimball UniversityMicrosoft Data Warehouse in DepthData Warehouse Lifecycle in Depthcourse materialsDesign Tips, Intelligent Enterprise, and SQL Server Magazine articles at www.KimballUniversity.com
© 2005-2007 Kimball Group. All rights reserved. 3
Agenda
The four stages of enterprise DW/BI evolution
The four challenges to enterprise DW/BI success
Techniques for overcoming those challenge
© 2005-2007 Kimball Group. All rights reserved. 4
Stage 2:Transition
Stage 3:ConformedEnterprise
DW/BI System
Stage 4:Informed
Operations
Stage 1:Independent
DataMarts
Conformed EnterpriseInformation Platform
The Four Stages of DW/BI Evolution
© 2005-2007 Kimball Group. All rights reserved. 5
Stage 1 – Independent Data Marts
One or more data marts developed independently
Departmental focus, OrSpecial project focus, OrData source focus --- IT Driven
Usually see short term success, but suffer from long term problems
Lack of business supportData problems (bad, missing, inconsistent, wrong)Team / organizational issues
Business Driven
© 2005-2007 Kimball Group. All rights reserved. 6
Independent Data Marts
DepartmentalData Marts
SourceSystems
Dimensional Detail andSummary Data – multiple sources
Marketing
Sales
LogisticsInventory
Orders
Billing /Returns
© 2005-2007 Kimball Group. All rights reserved. 7
Stage 2 - Transition
Organizational awareness of the problemDeveloping sense for the value of an Enterprise Information PlatformKey strategic stakes are pounded into the ground
Business purposeData architectureHardware architectureTools
Two common approaches to transitionConformed Data Warehouse“Enterprise Data Warehouse” (per the Corporate Information Factory)
Many organizations get stuck in Stage 2
© 2005-2007 Kimball Group. All rights reserved. 8
Common Stage 2 Approach: the CIF Enterprise Data Warehouse
EnterpriseData Warehouse
DepartmentalData Marts
SourceSystems
NormalizedDetail Data
DimensionalSummary Data
Staging Load Prep
Detail
Marketing
LogisticsInventory
Orders
Billing /Returns
Sales
© 2005-2007 Kimball Group. All rights reserved. 9
BusinessProcess
DimensionalModels
SourceSystems
ETLSystem
• Models contain atomic-level detail • with aggregates for performance• and transparent aggregate navigation• Includes both relational dimensional model
and OLAP dimensional model
Marketing
Sales
Logistics
DimensionProcessing
FactProcessing
Inventory
Orders
Billing
ReturnsAggregates
Inventory
Orders
Billing /Returns
Recommended Stage 2 Approach:Conformed DW/BI System
© 2005-2007 Kimball Group. All rights reserved. 10
Stage 3 – The Conformed Enterprise DW/BI System
Primary DW/BI system in full operationSeveral top priority data sets in placeUser access and support fully available
DW/BI system meets needs for (most) analytic and (some) operational reporting and analysis
Analytic successDW/BI system is an integral part of business decision makingRegular use of DW/BI system across the enterpriseClear examples of delivering substantial business valueBusiness people want moreResources are available; No one questions the DW/BI budget
© 2005-2007 Kimball Group. All rights reserved. 11
Stage 4 – Informed Operations
DW/BI system feeds high-value operational processesContext information (e.g. Customer Care)Integrated dataEmbedded analytics (e.g. up-sell, cross-sell, recommendations, fraud detection, churn, …)
General approach: Understand the real-time business requirementBuild systems to meet real-time business requirements
Reporting directly on transaction systemOperational data storeData warehouse real-time layer
Be careful of impact on DW/BI systemService level requirementsLoad and query performanceResource requirements
© 2005-2007 Kimball Group. All rights reserved.
Facing the Transition Challenges
What are the challenges and how
do you overcome them?
© 2005-2007 Kimball Group. All rights reserved. 13
The Four Challenges to Successful Transition
Stage 2:Transition
Stage 3:ConformedEnterprise
DW/BI System
Stage 4:Informed
Operations
Conformed EnterpriseInformation Platform
Stage 1:Independent
DataMarts
© 2005-2007 Kimball Group. All rights reserved.
Transition Challenges:
Business Support
Stage 2:Transition
Stage 3:ConformedEnterprise
DW/BI System
Stage 4:Informed
Operations
Conformed EnterpriseInformation
Platform
Stage 1:Independent
DataMarts
© 2005-2007 Kimball Group. All rights reserved. 15
Transition Challenges and Solutions:Low Business Support
SymptomsDepartment-centricData inconsistenciesUsers not interestedManagement not supportive
Underlying CausesData Warehouse is not focused on business valueBad data model design (next section)
Few/no BI applicationsPoor support for users (training, docs, portal, help)
© 2005-2007 Kimball Group. All rights reserved. 16
Transition Challenges and Solutions:Lack of Business Support (2)
SolutionsDefine enterprise business requirementsPrioritize requirements with senior managementDesign a robust, conformed dimensional model for the top priority requirement (next section)
Build the full solution for top priority requirements in the Enterprise context
Data model, ETL, data warehouse DBBI Applications + BI Portal + User support
Solution details…
© 2005-2007 Kimball Group. All rights reserved. 17
Defining Business Requirements: The Basic Approach
Interviews are preferableMust ask the right question
NOT “What do you want?”Ask “What do you do? What could you do better with better information?
Three step processPreparationInterviewsDocumentation
Two passes (including data source interviews)Enterprise(Senior Mgmt Prioritization)
Project
© 2005-2007 Kimball Group. All rights reserved. 18
Defining Business Requirements: the Interview Process
Assign roles and be readyCover key areas and listenTake notesDebrief with team immediately after
Common themes / opportunitiesRequired data (business processes)Do-abilityAreas requiring clarificationUser analytical / technical sophistication
You must do the formal documentation
© 2005-2007 Kimball Group. All rights reserved. 19
Requirements Prioritization: the Requirements Prioritization Session
Facilitation-based technique based on requirements definition
Senior Business and IT representativesDepartmental and enterprise-wide interests
Goals are to:ConfirmPrioritizeGain consensus
© 2005-2007 Kimball Group. All rights reserved. 20
Low
High
High
Low Feasibility
PotentialBusiness
Impact
A B
DC
Requirements Prioritization Session, cont’d
For each opportunity/theme, Evaluate business impact / benefitEvaluate feasibility
Outcomes:“Right” opportunitiesConsensusOwnershipEducationRoadmap for growth
E
© 2005-2007 Kimball Group. All rights reserved. 21
Develop the Full Solution:Key Components
Provide robust set of BI applicationsStandard reports – parameter drivenAnalytic applicationsManagement tools
Develop a clean, usable, content rich BI PortalEnsure full user support:query/reporting help, training, documentation, business metadata, report enhancementsLeverage BI tool capabilities
© 2005-2007 Kimball Group. All rights reserved. 22
Low Business Support Summary of Techniques
Identify business requirementsEnterprise(Prioritize)
Project
Work with management to prioritize requirementsProvide a full solution
BI ApplicationsRange of users and usesBI Portal
User support
Bottom line: you MUST focus on adding business value
© 2005-2007 Kimball Group. All rights reserved.
Transition Challenges:
Data Problems
Stage 2:Transition
Stage 3:ConformedEnterprise
DW/BI System
Stage 4:Informed
Operations
Conformed EnterpriseInformation
Platform
Stage 1:Independent
DataMarts
© 2005-2007 Kimball Group. All rights reserved. 24
Transition Challenges and Solutions:Data Problems
SymptomsUsers unable to create reports they needUsers view data warehouse data as “wrong”Users get different results from different marts
Underlying causesBad designWrong data
Lack of detailNo historical contextMissing sources
Poor data qualityBad data from source (Source systems not required to support analytics)Incorrect or inconsistent names, business rules, definitions
Lack of data integration
© 2005-2007 Kimball Group. All rights reserved. 25
Transition Challenges and Solutions:Data Problems (2)
SolutionsEnterprise data architecture (the Data Warehouse Bus Matrix)A solid, well designed dimensional model
Atomic level Conformed dimensionsCorrect tracking of attribute changes over time (Slowly Changing Dimensions)
Data stewardship and education
© 2005-2007 Kimball Group. All rights reserved. 26
The Data Warehouse Bus Matrix
Matrix of business processes (units of work) and conformed dimensions
Date Item Store Promo Dist Ctr Shipper VendorStore Sales X X X X
Store Inventory X X X
Store Deliveries X X X X X
Dist Ctr Inventory X X X
Dist Ctr Delivery X X X X XPurchase Orders X X X X
© 2005-2007 Kimball Group. All rights reserved. 27
Dimensional Modeling Basics
Dimensional: Why and HowFactsDimensionsSurrogate KeysSlowly Changing Dimensions (SCDs): Tracking attribute changes over time (Type 1, 2, 3)Conformed dimensionsDimensional misconceptionsDimensional platforms
© 2005-2007 Kimball Group. All rights reserved. 28
Dimensional: Why and How
Primary design goal: Support analytic queriesUsablePerformance
Basic approach:Denormalize dimensions for usabilityNormalize facts for performance
Key termsFacts = measures of business eventsDimensions = entities that participate in business events
© 2005-2007 Kimball Group. All rights reserved. 29
DATE KEYPRODUCT KEYSTORE KEYPROMOTION KEY
$ SalesUnit Sales
Terminology: Facts
Metrics resulting from business process or event
Facts are usually numeric and additive
Granularity/grainIdentifies the level of detailOne row per sale, one row per service call, one row per claim, …Atomic grain is most flexible
© 2005-2007 Kimball Group. All rights reserved. 30
PRODUCT KEY
Product Desc.SKU #SizeBrand Desc.Class Desc.
Terminology: Dimensions
Characteristics of a subject/objectWho, what, when, where, why, howProduct, Date, Patient, Facility …
Each row is an occurrenceOne row per product, day, patient, …
Dimension attributes (columns):Report labels and query constraints“By” words and “where” clausesVerbose descriptive attributes, in addition to codesHierarchical relationships
Product Dimension
© 2005-2007 Kimball Group. All rights reserved. 31
Terminology: Business Process Dimensional Model (or Star Schema)
1 fact table per business process / event, plus relevant dimensionsBenefits:
Easier to understand
Better performance from fewer joins & optimizer
Extensible to handle change
StoreAttributes
PromoAttributes
ProductAttributes
Product KEY Store KEY
Promo KEYFacts
Product KEY Date KEY Store KEY Promo KEY
DateAttributes
Date KEY
DimensionTables
DimensionTables
FactTable
© 2005-2007 Kimball Group. All rights reserved. 32
Creating Conformed Dimensions
All fact tables use shared, standard dimensionsEstablished via Bus Matrix, enforced in ETLDimensions are consistent across processes
Agree on column names and definitionsAssign surrogate key to every dimension rowCombine all attributes into Master dimension tableUse the Master dimensionsto map the businesskeys in the fact rowsto each dimension’ssurrogate key
Product CodeDescriptionBrandCategoryHeightWidthWeightStandard Cost
Product KEY
Marketing
Logistics
Cost Acctg.
ProductBusiness KeySurrogate Key
© 2005-2007 Kimball Group. All rights reserved. 33
Terminology: Slowly Changing Dimension
Techniques for handling changes to dimension attributes
Type 1: overwrite attribute valuesCommon default, appropriate for corrections
Type 2 : create a new dimension row when attribute value changes
Flexible technique, critical for accurately tracking behavior over time
Hybrid combinations of 1 and 2 are most common
Most ETL tools have a built-in Slowly Changing Dimension management capability
© 2005-2007 Kimball Group. All rights reserved. 34
Dimensional Modeling Myths and Misconceptions
Dimensional means summary
Dimensional models are built to support specific applications (or departments)
The dimensional model is less flexible than a third normal form model in DW/BI systems
The dimensional approach is not Enterprise oriented
© 2005-2007 Kimball Group. All rights reserved. 35
Data Quality
Quality gurus tell us we must fix the problem at sourceIt’s not that easy
Transaction systems not responsible for analytic data collectionBusiness people don’t understand situation and impactIn many cases, the problem is much worse than we think
© 2005-2007 Kimball Group. All rights reserved. 36
Data Quality Requires Data Stewardship and Education
Responsible for driving agreement on terms, definitions, and business rules
Responsible for data quality in DW/BI -- Key steps:Document current status (data profiling)Must educate business on:
Current status of data qualityImpact of poor data quality
Work with business to determine resolutionImplement changes in source and/or ETL systemMust drive organizational change around new responsibility of transaction systems to support both transactions and analytics
© 2005-2007 Kimball Group. All rights reserved. 37
Data Problems Summary of Techniques
DW Bus Matrix = Enterprise data architectureConformed enterprise dimensional model is foundation of enterprise analytic systemDimensional techniques are mandatory for enterprise view and historical accuracy
Conformed dimensionsChange tracking (esp. Type 2)
Data quality is critical and data steward must own itImproving data quality requires organizational change
© 2005-2007 Kimball Group. All rights reserved.
Transition Challenges: Organizational Issues
Stage 2:Transition
Stage 3:ConformedEnterprise
DW/BI System
Stage 4:Informed
Operations
Conformed EnterpriseInformation
Platform
Stage 1:Independent
DataMarts
© 2005-2007 Kimball Group. All rights reserved. 39
Transition Challenges and Solutions:Organizational / Team Issues
SymptomsNo DW/BI program leadershipPeople not dedicated or pulled off to fight other firesResources not availableTimeframes unreasonably short
Underlying causesIT driven effort – lack of business involvementLack of understanding of true scopeLack of methodology
© 2005-2007 Kimball Group. All rights reserved. 40
Transition Challenges and Solutions:Organizational / Team Issues
SolutionsRequirements definition and prioritizationIdentify and dedicate a DW/BI leaderCreate a DW/BI strategyFollow a proven methodologyActively involve key usersProvide ongoing communications with users, business sponsors and other stakeholders (AKA Marketing)
© 2005-2007 Kimball Group. All rights reserved. 41
DW/BI Leadership
The DW/BI Team must have strong leadership and a clear, shared vision: add business value
Not: build a data warehouseNot: create a certain report setNot: load a given set of data
The DW/BI team, and especially the team leader must have (or develop) a broad skill set:
BusinessTechnologyProblem solvingCommunication
ListeningWritten and oral
© 2005-2007 Kimball Group. All rights reserved. 42
Create an Enterprise DW/BI System Strategy
Business value focusedEnterprise data architectureSystem architecture for full solution
ETL, Databases, Aggregate mgmt., BI applications, ad hoc access, System infrastructure (servers, disks, backup), User support, Growth
Incremental delivery based on business prioritiesClear understanding of resource requirementsSenior management supportDW/BI System (and strategy) owner
© 2005-2007 Kimball Group. All rights reserved. 43
Follow a Proven Methodology:the Data Warehouse Lifecycle
TechnicalArchitecture
Design
BIApplication
Specification
BIApplication
Development
ProjectPlanning
Business
Requirements
Definition
Maintenance
Project Management
PhysicalDesign
ETL Design &Development
DimensionalModeling Deployment
Growth
ProductSelection &Installation
© 2005-2007 Kimball Group. All rights reserved. 44
User Involvement Opportunities
Kickoff meetingRequirements interviewFeedback on interview write upsRequirements doc reviewPrioritizationData model input and reviewPeriodic status meetingsData stewardship and quality reviewBI applications design and specFront end tool strategy and selectionBI applications developmentTestingTrainingSupportOngoing requirements and prioritizationRegular communications
© 2005-2007 Kimball Group. All rights reserved. 45
Project/Program Communications Plan
Need multiple channels for different audiencesTeam: Weekly meetings and status reportsKey users: bi-weekly or monthly statusIT managementBusiness sponsors: bi-weekly or monthly statusSenior staff: monthly summary
Problem notificationLet people know right awayLet them know when you’ve fixed it
User feedback
© 2005-2007 Kimball Group. All rights reserved. 46
Ongoing Marketing Communications Techniques
DW/BI purpose, value, process and directionRepresentation at key meetings
Senior StaffPlanning (Strategic, Product, System, …)
DW/BI PortalDW/BI sponsored programs (e.g. User Forum)Classes (Ad hoc and standard reports)All DW/BI materials must carry the message
BI Portal, documentation, training, reports, e-mails…
© 2005-2007 Kimball Group. All rights reserved. 47
Organizational and Team Summary of Techniques
Identify and dedicate a DW/BI leaderCreate a DW/BI strategyFollow a proven methodologyActively involve key usersProvide project and ongoing communications with users, business sponsors and other stakeholders (AKA Marketing)
© 2005-2007 Kimball Group. All rights reserved.
Transition Challenges: Performance and
Scalability
Stage 2:Transition
Stage 3:ConformedEnterprise
DW/BI System
Stage 4:Informed
Operations
Conformed EnterpriseInformation
Platform
Stage 1:Independent
DataMarts
© 2005-2007 Kimball Group. All rights reserved. 49
Transition Challenges and Solutions:Performance and Scalability
SymptomsSlow queries, frustrated usersData loads take too longCompromises (like dropping useful history to make room)
Underlying causesIncreasing data volumes Narrowing load window (availability requirements)Complexity of ETL processesSub-optimal tuning
© 2005-2007 Kimball Group. All rights reserved. 50
Transition Challenges and Solutions:Performance and Scalability (2)
SolutionsPartitioningAggregates
Multi-dimensional (MOLAP)Relational (ROLAP)
Platform improvements64 bit / memoryBigger / easily expanded systems More flexible architectureDisk sub-systems
© 2005-2007 Kimball Group. All rights reserved. 51
Performance Tools: Partitioning
Basic concept: take one big table (or cube) and break it up into smaller, more manageable pieces.For example, partition a 5 year table by month
1 2 3 4 5 6 7 8 … 60
SalesFact
Can now load into a single partition – like loading into a much smaller table (1/60th, in this case)Can also load into an empty partition and swap it
© 2005-2007 Kimball Group. All rights reserved. 52
Performance Tools: Aggregates
Aggregates are the primary performance tool in analytic data storesMust have aggregate management/navigation functionality for transparent useOLAP engines provide two key functions
Pre-aggregation for analytic query performanceEnhanced language for analytic query formulation
© 2005-2007 Kimball Group. All rights reserved. 53
Performance Tools:Platform Improvements
Analytic system processes (load and query) love memory64 bit platforms are the new standard for DW/BI systemsLarger systems with dual core CPUs are an easy way to handle large data volumesFlexible system architectures allow best of scale up and scale outDisk sub-systems can help with parallel load/query processing and backups
© 2005-2007 Kimball Group. All rights reserved. 54
Performance and Scale Summary of Techniques
PartitioningAggregates
Multi-dimensional (MOLAP)Relational (ROLAP)
Platform improvements64 bit / memoryBigger / easily expanded systems More flexible architectureDisk sub-systems
© 2005-2007 Kimball Group. All rights reserved.
Conclusions
© 2005-2007 Kimball Group. All rights reserved. 56
Is Successful Transition Possible?
YES!But can’t do it all at once
Use business requirements and priorities to determine order
Critical stepsEnterprise business requirements definition *Opportunity prioritization with senior management *Create an Enterprise DW/BI System strategy *Follow a methodology to implement the top business priority data set in an enterprise conformed dimensional approach (e.g. the Data Warehouse Lifecycle) *Build on a flexible, scalable hardware/software platformStart on the next data set - iterate
© 2005-2007 Kimball Group. All rights reserved. 57
The Transition Results
An enterprise information resourceReal, provable business valueUser involvement and supportSenior mgmt. involvement and supportSolid, flexible data modelsPositive sense of accomplishmentPlenty of opportunities
© 2005-2007 Kimball Group. All rights reserved. 58
Stage 2:ConformedCore DW
(Transition)
Stage 3:ConformedEnterprise
DW/BI System
Stage 4:Informed
Operations
Conformed EnterpriseInformation Platform
Successful Transition is Possible (with the right tools, techniques and skills)
Stage 1:Independent
DataMarts Bus
iness
Req
sPrio
ritiza
tion
Dimen
siona
lMod
elLif
ecyc
leApp
roac
hSys
temArch
itectu
reBI A
pplic
ation
SetPur
pose
-Buil
t
Tools
et