IBM DWH SEminario Roma 3
-
Upload
sandro-gallo -
Category
Documents
-
view
216 -
download
0
Transcript of IBM DWH SEminario Roma 3
-
8/3/2019 IBM DWH SEminario Roma 3
1/79
2009 IBM Corporation
IBM Software Group
Seminario: lEcosistema DataWarehouseTrend Tecnologici , Best Practices, Esperienze di progetto
Fabrizio Napolitano, IBM Data Warehouse Architect
Roma, 09-17 Aprile 2010
-
8/3/2019 IBM DWH SEminario Roma 3
2/79
2 2009 IBM Corporation
Agenda
Introduzione
in che ecosistema si posiziona il Data Warehouse?
Trend attuali del settore
Il Ciclo di vita di un progetto di Data Warehouse best practices eprincipali errori da evitare: un caso di studio
Modellazione del Data Warehouse problematiche attuali:
Consolidamento ambienti come utilizzare modelli logici settoriali per
semplificare il processo
Implementazione decentrata di un DWH consolidato, un caso di studio
Trend Tecnologici: L'era delle DWA - Data Warehouse Appliances
Integrazione dei Dati (17/04/2010) :
Metodologie e Best Practices per la fase di sviluppo dei flussi di ETL
Limportanza della gestione dei Metadati
-
8/3/2019 IBM DWH SEminario Roma 3
3/79
3 2009 IBM Corporation
What Do Companies Need
for Business Intelligence & Analytics?
Industry Models
Information Integration
Master Data Management
BI & Performance Management Tools
Data Warehouse
Servers and Storage
Strategy and Implementation Services
Metadata
Data
Governance
-
8/3/2019 IBM DWH SEminario Roma 3
4/79
4 2009 IBM Corporation
What Do Companies Need
for Business Intelligence & Analytics?
Industry Models
Information Integration
Master Data Management
BI & Performance Management Tools
Data Warehouse
Challenges Integration costs & skills
Metadata synchronization
Performance optimization
Administration costs & skills
Maintenance costs & skills
Upgrade synchronization
Ongoing integration certification
Servers and Storage
Strategy and Implementation Services
Metadata
Data
Governance
-
8/3/2019 IBM DWH SEminario Roma 3
5/79
5 2009 IBM Corporation
Whats Happening Out There? (Trends)
1. Many mature warehouses are being re-architected.
According to Gartner Group almost 1/3 of data warehouse projects will be doover. Whats behind this trend?
Lack of ROI
A Gartner Group study show that only 40 percent of enterprises measure ROI for their datawarehousing initiatives
How do you know if you succeeded if you do not measure it? The big push to consolidation of data
Currently cross LOB analysis is one of the hottest subject in BI
Focus is shifting fromperformancetochanging business needs
The warehouse that is architected only for performance may not react well to changes.
Focus on agility and reuse not just pure performance
-
8/3/2019 IBM DWH SEminario Roma 3
6/79
6 2009 IBM Corporation
More on Trends
2. Cost and delivery pressure (anyone not have that?).
The need for data to answer a specific business need in a compressed time period causes(more and more) data proliferation
Costs!!! DW operational costs appear to outweigh benefits and the pressure to reducecosts is severe to most DW organizations (remember the ROI problem?)
3. Warehouses have become more active and critical at the same time!.
Warehouses are not only becoming more active, but they are also becoming more critical
(did you plan for that ?) This drives the need for a completely different architecture andthings like HA and DR.
Batch windows shrinking, queries becoming more complex, need for more sophisticatedanalytics (all at once!)
-
8/3/2019 IBM DWH SEminario Roma 3
7/79
7 2009 IBM Corporation
and more
4. In comes the Appliance.
Isnt appliance just a cool word for having a prescribed solution that works andlessens the time to market?
Doing it yourself is so out
..you could build your own appliance. It would probably take three years,you would need some highly skilled engineers who you have to pay at acommensurate rate but, yes, you could do that. You could also build your ownERP system that had all the features of SAP in it, but just because you coulddoesnt mean that it would make sense.
> Phillip Howard, Bloor Research
Appliance = reducedtime to market+builtfor data warehousing + hard toignore!
-
8/3/2019 IBM DWH SEminario Roma 3
8/79
8 2009 IBM Corporation
Data Design Trends
1. Back to the single source of truthaka Enterprise BI, EnterpriseIntelligence.
Data that is used is data that is exposed
Compliance laws
Need for more detailed data
Ye Shallmaster thy
Data
2. Right-time replaces real time
Match need to application
3. Dont just load your data- MASTER your data!
Reuse is key
-
8/3/2019 IBM DWH SEminario Roma 3
9/79
2009 IBM Corporation
IBM Software Group
Seminario: lEcosistema DataWarehouseIl Ciclo di vita di un progetto di Data Warehouse , best practicese principali errori da evitare
Fabrizio Napolitano, IBM Data Warehouse Architect
Roma, 09 Aprile 2010
-
8/3/2019 IBM DWH SEminario Roma 3
10/79
10 2009 IBM Corporation
The Top 10 Best Practices for a successful Data Warehouseno- thats not a typo they are all number 1
Have a business based strategy and get sponsorship
Market the warehouse internally (early and often)
Have the right organization to help you manage the warehouse
Data Governance and Stewardships
Build Towards Consolidation
Balance increasing costs with increasing value Have a solid data architecture
Architect for change, not only performance
Have a disaster recovery plan
Never neglect information quality
Gathered from customers and analyst interviews
-
8/3/2019 IBM DWH SEminario Roma 3
11/79
11 2009 IBM Corporation
Datawarehouse Project Most common MistakesThe Anti-Architect - Kimball
Mistake 1: Rely on past consultants or other IT staff to tell you the datawarehouse requirements
Mistake 2: Live with the assumption that the administrators of the majorOLTP source systems of the enterprise are too busy
Mistake 3: After the data warehouse has been rolled out, set up a planningmeeting to discuss ongoing communications with the end users, if thebudget allows
Mistake 4: Make sure all the data warehouse support personnel have nice
offices in the IT building
Mistake 5: Declare end-user success at the end of the first training class
Mistake 6: Assume that sales, operations, and finance end users willnaturally gravitate to the good data and will develop their own killer apps
Mistake 7: Make sure that before the data warehouse is implemented youwrite a comprehensive plan that describes all possible data assets of your
enterprise and all the intended uses of information
-
8/3/2019 IBM DWH SEminario Roma 3
12/79
12 2009 IBM Corporation
Datawarehouse Project Most common Mistakes
The Anti-Architect - Kimball
Mistake 8: Don't bother the senior executives of your organizationwith the data warehouse until you have it up and running and can
point to a significant success Mistake 9: Encourage the end users to give you continuous
feedback throughout the development cycle
Mistake 10: Agree to deliver a high-profile customer-centric data
mart as your first deliverable
Mistake 11: Define your professional role as the authority onappropriate use of the data warehouse
Mistake 12: Collect all the data in a physically centralized datawarehouse before interviewing any end users or releasing anydata marts
-
8/3/2019 IBM DWH SEminario Roma 3
13/79
13 2009 IBM Corporation
Business Sponsorship Can Save Your Warehouse
One of the most common, yet potentially fatal disorders involves thesponsorship of the DW/BI environment. A business sponsor disorder isoften the contributing factor to data warehouse stagnation.
Margy Ross, Ralph Kimball
BusinessSponsor
-
8/3/2019 IBM DWH SEminario Roma 3
14/79
14 2009 IBM Corporation
TV
Datawarehouse Project: A Telco case StudyThe Project scope
2006 2007 2010
Eind 2007 (i.p.v. 2009) eerste formele klantbeeld als input voor klantinteractie
C.C.DWH
Bestaandebronnen
Nieuwebronnen(VaMo)
Input voor klantinteractie
Prototypes
Vast
Mobiel
Internet
CRM Data-analyse
Bestaandebronnen
Productgericht
Prototypes
Geen
klantbeeld
Quick vamo
TV
Vast
Mobiel
Internet
CRM Data-analyse
Geen klantbeeld
CustomerCentricDWH
CRM Data-analyse
Input voor klantinteractie
P
rototyping
CRMFoundation/One Billing
-
8/3/2019 IBM DWH SEminario Roma 3
15/79
15 2009 IBM Corporation
Datawarehouse Project: A Telco case studyThe Issue BI / DWH Project Sponsored by CRM director (IT)
Seen as Technical Enabler -> not Business Driven
IT Organization changes impact heavily the project
Many IT DWH Projects in different department Not all IT Manager sponsoring / supporting the new DWH Project
Lack of overview of status, deliverables, interdependency of all CRM-data relatedprojects and insight in support of project objectives to objectives of CLM and ZM
Klantbeeld. Limited insight if information requirements as outlined by business are covered in
running and future CRM data-related projects, how and when.
No matching CRM-data model (compliant with SID/Siebel for ZM Klantbeeld andtherefore no sufficient guidance from desired Klantbeeld towards feasible and coherent ITprojects.
Limited business involvement in running BI Program and CRM-data related projects.Limited alignment of data-related efforts between demand (business) versus supply (ITNL).
Fragmented processes, unclear ownership, roles and responsibilities related to CRM-data projects and maintenance.
Limited steering on CRM data-related projects possible
-
8/3/2019 IBM DWH SEminario Roma 3
16/79
16 2009 IBM Corporation
Background
Within xx, several projects have recently been started by business and IT that should improve thequality and availability of CRM data for analytical and operational CRM activities and contribute tothe 360view of the customer. With regard to these projects, the following issues are perceivedby KPN:
Lack of overview of status, deliverables, interdependency of all CRM-data related projects andinsight in support of project objectives to objectives of CLM and ZM Klantbeeld.
Limited insight if information requirements as outlined in ZM Klantbeeld are covered in running andfuture CRM data-related projects, how and when.
No matching CRM-data model (compliant with SID/Siebel for ZM Klantbeeld and therefore no
sufficient guidance from desired Klantbeeld towards feasible and coherent IT projects. Limited business involvement in running BI Program and CRM-data related projects. Limited
alignment of data-related efforts between demand (business) versus supply (IT NL).
Fragmented processes, unclear ownership, roles and responsibilities related to CRM-data projectsand maintenance.
Limited steering on CRM data-related projects possible.
In order to start solving these issues, KPN wants to improve data governance for KPN ZM CRMdata related projects.
As a first step, KPN ZM wants to start a project to agree on a roadmap on the delivery of ZMKlantbeeld information requirements, to define a data architecture and to define, implement andpilot a pragmaticgovernance framework around the running and future CRM-data related
projects.
-
8/3/2019 IBM DWH SEminario Roma 3
17/79
-
8/3/2019 IBM DWH SEminario Roma 3
18/79
18 2009 IBM Corporation
Datawarehouse Project: A case Study
Lessons Learned what you should do
IT
Business
-
8/3/2019 IBM DWH SEminario Roma 3
19/79
19 2009 IBM Corporation
Datawarehouse Project: A case Study
Lessons Learned how could you do it
Align with Business strategy
Communicate to the right level
Includes the set up of a Business Glossary
Data Governance
BI Governance
Use a DWH tailored Project Lifecycle methodology
-
8/3/2019 IBM DWH SEminario Roma 3
20/79
20 2009 IBM Corporation
Its All About the Value, NOT the Technology
In the end, data warehouse implementationshouldnt be the focus; its a means. The goal is todeliver a solution to support an immediate businessneed.
Baseline Consulting
Hitting the targetMeans expressingBusiness value
-
8/3/2019 IBM DWH SEminario Roma 3
21/79
21 2009 IBM Corporation
How do I best align to the business strategy
First, keep asking yourself the question: why does it matter to the business?
The business strategy for the warehouse can be found everywhere
What is the company mission and how can the warehouse play a role insupporting that? (Its on your wall, on your website, on your annual report)
Create a business advisory committee for the warehouse
Who on the committee is the most vocal and passionate?
Look for more than one sponsor for true success in the enterprise (yes have asponsor redundancy program!)
Technology Business Need
-
8/3/2019 IBM DWH SEminario Roma 3
22/79
22 2009 IBM Corporation
ProgramManagementOffice (PMO)
Sample DW Program Structure
ExecutiveSponsorship
DW ProgramManager
Executive Sponsorship
Data StewardshipSteering Committee
Data Warehouse
Program Management
And Oversight
DW Technical Architect
Data Quality Coordinator
Metadata CoordinatorResource Coordinator
Requirements Coordinator
Change Control
DW Development & Maintenance
Project Teams DW Maintenance Tools SupportProject Manager
Business Analysts
Source Analysts
Data Modeler
ETL Developer
DBA
BI Tool Developer
Testing Coordinator
ImplementationCoordinator
Metadata Management
Source Extract Support
ETL Support
Reporting/Analytic Support
DBAs
Data Modelers
ETL Specialist
Query & Reporting Specialist
OLAP Specialist
Data Quality Specialist
Data Mining Specialist
-
8/3/2019 IBM DWH SEminario Roma 3
23/79
23 2009 IBM Corporation
Focus Communication to the Business Users
Have a mission statement for the warehouse
Communicate milestones that map to that mission
Make the warehouse a raving success in the business.
Do not get caught up in communicating the wrong milestones
DO Communicate what business questions can be answered, problemsresolved and opportunities identified
DONT over communicate hardware upgrades, OS changes, new investmentsthat do not bring new value
DW Stats
I think I speak for everyonewhen I say - what in Godsname are you talkingabout????
-
8/3/2019 IBM DWH SEminario Roma 3
24/79
24 2009 IBM Corporation
Communicate again
Communication early, Communicate often!!!
How often do you talk about what the warehouse is doing today with theexecutives?
Push out a scorecard monthly
How many business questions did the warehouse answer last month? A
query is a BUSINESS Question!!!!!!
Use the warehouse to establish leadership externally
Know your warehouse stats like your childrens birthdays!
Example (JPMC)
775 end users, 276 Source systems,8729 attributes
15 TB database growing to 20 TB over next 18 months 28,000 Batch ETL jobs/month
2,000 5000 Queries / Day
-
8/3/2019 IBM DWH SEminario Roma 3
25/79
25 2009 IBM Corporation
A Model for BI Governance
Data Governance:Management of enterprisedata assets to increase theuse and trust of the data.
Process Governance:Business oversight of the
decisions to align planning,measurement, and analysis
efforts across the organization.
OrganizationalGovernance:
Processes, people and
structure that enable theongoing management and
control of BI initiatives.
Technology Governance:Ensuring that the rightportfolio of tools and
technologies are in the placeto deliver the right BI
capabilities to the business.
Align and Manage:Processes and people that manage the alignment of BI
resources to BI strategies. Management of interdependent
efforts and initiatives.
TechnologyTechnology
ProcessProcess
OrganizationOrganization
DataData
Align andAlign andManageManage
-
8/3/2019 IBM DWH SEminario Roma 3
26/79
26 2009 IBM Corporation
Components of Integrated BI Governance
TechnologyTechnology
ProcessProcess
OrganizationOrganization
DataData
AlignmentAlignment andandManagementManagement
BI Steering Committee, BI Guiding Principles,
Strategy & RoadmapGovernance,
BI ProgramManagement (PMO)
Enterprise Data Management Data Stewardship Data Quality Management Data Integration
Management (Defining aSingle Version of the Truth)
Meta Data Management
Organization Structure Constructs CoC, PMO
Work Group Design Skills & Behavior
Development Training Job Design Roles and Responsibilities
Accountability & Decision making
Tool and TechnologyStandards
Common reference andsolution architecture
Business PerformanceManagement
Integrated Planning
Forecasting & Budgeting KPI Rationalization
Decision Making Processes
-
8/3/2019 IBM DWH SEminario Roma 3
27/79
27 2009 IBM Corporation
Datawarehouse Project Life Cycle
(The Kimball Lifecycle diagram )
-
8/3/2019 IBM DWH SEminario Roma 3
28/79
28 2009 IBM Corporation
Program Management & Organizational Change
Quality Assurance
Data Quality
MetadataTechnical Infrastructure
SolutionSolution
OutlineOutline
BI Strategy and PlanningBI Strategy and Planning
AnalyticsLayer
DataRepository
Layer
Macro Micr
oDeploy
Build
Our BI method embeds key themes throughout the lifecycle and is tightly linkedwith our BI Reference Architecture
Security & Privacy
BI ReferenceArchitecture
Key Themes
AccessLayer
DataIntegration
Layer
Incremental
Iterative
The Business Intelligence Method
-
8/3/2019 IBM DWH SEminario Roma 3
29/79
29 2009 IBM Corporation
Note: For clarity, allactivities arenot shown
The Business Intelligence Method
Based on an industry leading set of phases, activities, and tasksCreate Logical
Data RepositoriesDesign
Create PhysicalData Repositories
Design
PerformData Repositories
Build
SolutionO
utline
DefineInfrastructureRequirements
DefineOrganization
Review ClientEnvironment
OutlineSolution
Requirements
OutlineSolution
Strategy
DetermineData Integration
Requirements
DetermineData RepositoryRequirements
DetermineAnalytics
Requirements
AssessBusiness Impact
ConfirmSolution Outline
BIStrategyan
dPlanning
Review Client
Business & ITEnvironment
Identify SolutionAreas
Define BusinessSolution Strategy
Define TechnicalSolution Strategy
Outline
ArchitectureModel
Assess
InfrastructureImpact
Confirm BI
Strategy andPlanning
MacroDesign
Create Logical
Data IntegrationDesign
Create LogicalData Repositories
Design
Create LogicalAnalyticsDesign
Create Logical
AccessDesign
DesignArchitecture
Model
Design SolutionPlans
Design TestSpecifications
BuildDevelopmentEnvironment
MicroDe
sign
Create Physical
Data IntegrationDesign
Create PhysicalData Repositories
Design
Create PhysicalAnalyticsDesign
Create Physical
AccessDesign
RefineArchitecture
Model
PerformStatic Testing
Define Trainingand User Support
PlanDevelopment
BuildC
ycle
Build
Data IntegrationCode
PerformData Repositories
Build
Build/ExtendAnalytics
Components
Build/Test
AccessComponents
Prepare forTesting
PerformDevelopment
Testing
PerformSystemTesting
PlanDeployment
Deploym
ent
PerformAcceptance
Testing
Setup Production
Environment
Deploy ClientSupport
Cutover toProduction
ImplementationCheckpoint
-
8/3/2019 IBM DWH SEminario Roma 3
30/79
2009 IBM Corporation
IBM Software Group
Seminario: lEcosistema DataWarehouse
Modellazione del Data Warehouse problematiche attuali
Fabrizio Napolitano, IBM Data Warehouse Architect
Roma, 09 Aprile 2010
-
8/3/2019 IBM DWH SEminario Roma 3
31/79
31 2009 IBM Corporation
Achieving the Goal- One Source of Truth for All
Despite their best intentions, CIOs are struggling to deliverconsistent data that provides a single view across the enterprise.
CIOs who seek this so-called single version of the truthmust feellike they are playing an endless game of Whack-a-Moleevery timethey stamp out a renegade analytic silo, another pops upelsewhere.
TDWI Research Report
Whack O
MARTS
-
8/3/2019 IBM DWH SEminario Roma 3
32/79
32 2009 IBM Corporation
Current Issues for a Data Warehouse Architect
Data Warehouse Consolidation
Merges and Acquisitions
Data Mart Consolidation
One Version of the truth
Reduce complexity from Data Mart Explosion
Data Warehouse Standardization
Multiple line of businesses
Global corporation
IBM Industry Models Introduction
-
8/3/2019 IBM DWH SEminario Roma 3
33/79
IBM Industry Models Introduction
33
The True Cost of Inflexible Data Models
Most data warehouse logical data models tend to be optimized(i.e., biased) towards:
1. Source systems
difficult to use for integrating data from any otherapplication
2. Current application query patterns (Business requirements)
evolve and become more sophisticated over time
exceed initial design assumptions
Failure of the solution to keep pace with the business
Diminishing business value
Much of the effort involved in modifying a traditionally designeddata warehouse is associated with rewriting the DDL, ETLprocesses and SQL, for creating, loading and querying the datawarehouse respectively
IBM Industry Models Introduction
-
8/3/2019 IBM DWH SEminario Roma 3
34/79
IBM Industry Models Introduction
34
Case Study: North Europe Telco with many companies
around the World
Develop DWH once usinga reference model in a first
country pilot Reuse many time to deploy
on the other countries
Realignment onMillicom unified model
Limited BI solution experience
TZ implementation
DRC implementation
xx implementation
TZ tests
DRC tests
xx tests
1DW implementation
IBM Industry Models Introduction
-
8/3/2019 IBM DWH SEminario Roma 3
35/79
IBM Industry Models Introduction
35
According to best practices TDWM should be fed and separated from operationalsystems via staging area
On-netDLD
Unrated CDRS
Rated CDRs
Invoice Detail
On-netIDD
Off-netDLD
Off-netIDD
Wireless
Other
Switches &Gateways
All Inbound &Outbound
Rateableand
Chargeablle
Interconnect
SettlementRating &
Mediation
Billing
InvolvedParty
Arrangement
ServiceUsage
UsageComponent
BillingRate
ChargingRate
InvoiceHeader
InvoiceDetail
NetworkComponent
Network
Billable(above entitled
amount)
Originating Service Providers Terminating Service Providers
Subscribers and Inbound Roamers
Postpaid Subscriptions Prepay Cards Interconnect Agreements Service Level Agreements Pricing Agreements
Call Detail Records (1 for each call)
Each Service Usage has multiplecomponents for each rate basis
Applicable Internal andexternal rates (i.e., billingrates, interconnect rates,network costs, VAT, etc.)
Circuits Switches Gateways
Interconnecting Service ProviderService
Provider (IP)
Interconnecting Network
The invoice table willstore billing history foreach call
Telco Data Warehouse ModelTelco Data Warehouse ModelTelco Operational systemsTelco Operational systems
STAG
INGARE
A
IBM Industry Models Introduction
-
8/3/2019 IBM DWH SEminario Roma 3
36/79
y
36
while the Paktel implementation already deviated from these best practices infavour of country a specific implementation
On-netDLD
Unrated CDRS
Rated CDRs
Invoice Detail
On-netIDD
Off-netDLD
Off-netIDD
Wireless
Other
Switches &Gateways
All Inbound &Outbound
Rateableand
Chargeablle
Interconnect
SettlementRating &
Mediation
Billing
InvolvedParty
Arrangement
ServiceUsage
UsageComponent
BillingRate
ChargingRate
InvoiceHeader
InvoiceDetail
NetworkComponent
Network
Billable(above entitled
amount)
Originating Service Providers Terminating Service Providers
Subscribers and Inbound Roamers
Postpaid Subscriptions Prepay Cards Interconnect Agreements Service Level Agreements Pricing Agreements
Call Detail Records (1 for each call)
Each Service Usage has multiplecomponents for each rate basis
Applicable Internal andexternal rates (i.e., billingrates, interconnect rates,network costs, VAT, etc.)
Circuits Switches Gateways
Interconnecting Service ProviderService
Provider (IP)
Interconnecting Network
The invoice table willstore billing history foreach call
Telco Data Warehouse ModelTelco Data Warehouse ModelTelco Operational systemsTelco Operational systems
MSC CDR
No Interconnect System in Pakistan Incoming calls stored in a newlocal table not based on TDWMTable layout based on sourceMSC_CDR layout, not TDWMRating logic replicated in theData WarehouseAnalysis area and reports arechanged accordingly
Paktel did not implement staging areaData modified in Data Warehouse
??
??
??
IBM Industry Models Introduction
-
8/3/2019 IBM DWH SEminario Roma 3
37/79
y
37
with further deviations for TZ rather than realigning on TDWM
On-net
DLD
Unrated CDRS
Rated CDRs
Invoice Detail
On-netIDD
Off-netDLD
Off-netIDD
Wireless
Other
Switches &Gateways
All Inbound &Outbound
Rateableand
Chargeablle
Interconnect
SettlementRating &
Mediation
Billing
InvolvedParty
Arrangement
ServiceUsage
UsageComponent
BillingRate
ChargingRate
InvoiceHeader
InvoiceDetail
NetworkComponent
Network
Billable(above entitled
amount)
Originating Service Providers Terminating Service Providers
Subscribers and Inbound Roamers
Postpaid Subscriptions Prepay Cards Interconnect Agreements Service Level Agreements Pricing Agreements
Call Detail Records (1 for each call)
Each Service Usage has multiplecomponents for each rate basis
Applicable Internal andexternal rates (i.e., billingrates, interconnect rates,network costs, VAT, etc.)
Circuits Switches Gateways
Interconnecting Service ProviderService
Provider (IP)
Interconnecting Network
The invoice table willstore billing history foreach call
Telco Data Warehouse ModelTelco Data Warehouse ModelTelco Operational systemsTelco Operational systems
NUM_CALL
Interconnect System in Tanzania Incoming calls stored in a newtable based on Paktel approach,not TDWMTable layout based on sourcesystem layout, not TDWMAnalysis area and reports arechanged accordingly
Tanzania did not implement staging areaData modified in Data Warehouse
??
??
??
IBM Industry Models Introduction
-
8/3/2019 IBM DWH SEminario Roma 3
38/79
38
CorporateLocal
ETL
ETL BOUniverseETL
ETL
ETL BOUniverse
System
Of RecordSummary Area DataMarts
SourceIndependentStaging Area
Sources
Country Z
Local
Business
Reports
CorporateBusinessReports
ETL
ETL
ETL BOUniverseETL
ETL
Sources
Country B
Identical
ETL
ETL
ETL ETL
Sources can be different by country Country specific development is limited to ETL1/2
Different
Different
Sources
Country A
ETL ETL
ETL ETL
All reports and models are identical for all countries All other components, including ETL3/4, are exactly identical to xxx Corporate DW
Solution
Country
assuming that all reports and models are identical for all countries, only the source
data and ETL1/2 processing being potentially different
ETL1/2
MIC Corporate DW Solution
REPLICA of MIC Corporate DW Solution
REPLICA of MIC Corporate DW Solution
Identical
IBM Industry Models Introduction
-
8/3/2019 IBM DWH SEminario Roma 3
39/79
39
The Importance of Flexible, Generic Models
Trade off :
information model optimization for normalization and generics
Improve:
longer term model manageability
extensibility
synchronization between source and target applications andbusiness processes
StarSchemas
(Denormalized)
Optimizaed Summaries
3NF Detail Data
Load Performanceincreases with theNormalization Level.
Query Performanceincreases with theAggregation Level
Layer 2
Layer 1
TDWBSTs
(Denormalized)
TDW Summary Area
TDW System of Record
IBM Industry Models Introduction
-
8/3/2019 IBM DWH SEminario Roma 3
40/79
40
A Simplistic Example of Generic Modeling - Simple
Models for Complex Data To extend traditional data models for new
concepts requires new tables to be created,with all the associated DDL, ETL and SQL code
to create, load and access them. Genericmodels are much more flexible.
Department
Product Group
Product
Service
Usage
Division
Customer
Billing Account
Rate Group
Rate
Service Instance
Company
Event
User
Invoice
Payment
Party
Product
Arrangement
Inter-subject areaassociative tables
Time-variant,perspectivebasedhierarchies
ConditionCondition
New requirements areadded as DATA, not asstructural changes
IBM Industry Models Introduction
-
8/3/2019 IBM DWH SEminario Roma 3
41/79
41
xDW Data Models three interlinked models
mapping
mapping
xSDM
Classification model for definingbusiness meaning across allmodels, applications and
databases
xDWM
Data Warehouse ModelLogical E-R Model for designingcentral data warehouse
xBST
Business Solution TemplatesLogical Measure/Dimension Model fordefining user information requirements
mapping
IBM Industry Models Introduction
-
8/3/2019 IBM DWH SEminario Roma 3
42/79
42
xDW Architecture
Ph
ysicalDesign
LogicalDesign
ETL/Messaging
Sources
Billings
Front Office
&
Apps
Other
Sources
Market
Data
Accounting
Systems
CIF
Business
Applications
Profitability
Rel. Mgt
Usage
Ops & Fin
Mgt Reporting
Data Mining
PredictiveModeling
Data Analysis
& Reporting
Enterprise DW design can be
generated over a series ofmanageable phases
Warehouse Mgmt & Admin
Metadata Mgmt & Metadata Repository
Data Mart DB design can begenerated from Templates
Enterprise Data Warehouse
Summary
AnalysisStaging
Area
System
Of
Record
Classified
Sources
Feedback
Data Mart
DB Structures
ROLAP
Relational
Other
OLAP
Server *
Essbase
Mapping between BSTs and DWModel enable rapid scoping
Data Mart templatesenable fast accurate
requirements gathering
Data Warehouse model
for specific industryprovides full enterprise
data warehouse blueprint
Overall corporate dataclassification model with
common language & terms
-
8/3/2019 IBM DWH SEminario Roma 3
43/79
-
8/3/2019 IBM DWH SEminario Roma 3
44/79
-
8/3/2019 IBM DWH SEminario Roma 3
45/79
45 2009 IBM Corporation
Best Practices: Using Information Template for Data
Mart ConsolidationOSS/BSS
DW #1(Marketing)
DW #1(Marketing)BespokeBespoke
ETLETL
OE/OMOE/OM
BillingBilling
Campaign Mgmt.Campaign Mgmt.
CRMSCRMS
Retail POSRetail POS
General LedgerGeneral Ledger
BillingBilling
A/P, A/RA/P, A/R
CollectionsCollections
Retail POSRetail POS
DW #2(Finance)
DW #2(Finance) BespokeBespoke
ETLETL
OSS/BSS
ROLAP
MOLAP
AggregationsAggregations
ProfilingProfiling
ScopedScopedTBSTsTBSTsTDW Standard Measuresand Dimensions
DW #1(Marketing)
DW #1(Marketing)
DW #2(Finance)
DW #2(Finance)
ConsolidatedEDW
ConsolidatedEDW
AggregationsAggregations
ConsolidatedData Mart
ConsolidatedData Mart
MOLAP ROLAPDB2 OLAP ServerDB2 OLAP Server
Business ObjectsBusiness Objects
Cognos ImpromptuCognos Impromptu
MicrostrategyMicrostrategy
45 2006 IBM Corporation
-
8/3/2019 IBM DWH SEminario Roma 3
46/79
-
8/3/2019 IBM DWH SEminario Roma 3
47/79
2009 IBM Corporation
IBM Software Group
Seminario: lEcosistema DataWarehouse
Trend Tecnologici: L'era delle DWA - Data WarehouseAppliances
Fabrizio Napolitano, IBM Data Warehouse ArchitectRoma, 09 Aprile 2010
-
8/3/2019 IBM DWH SEminario Roma 3
48/79
48 2009 IBM Corporation
What is a DWA?
Native Data Warehouse Appliance The hardware and software is tightly integrated into a single data warehouse solution.The software and hardware are not individually licensed and cannot be separated. Examples of vendors here include DATAllegro, Netezza,
and Teradata.
Software Data Warehouse ApplianceCommercial or open source relational DBMS software is designed and/or optimizedfor data warehouse processing. The software supports hardware solutions purchased from one or more third-party vendors. Examples ofvendors here include Greenplum and Sybase (Sybase IQ).
Packaged Data Warehouse ApplianceCommercial software and hardware is tuned for data warehousing, is packagedand supplied by a single vendor, and is installed and maintained as a single system. Examples of vendors here include HP (NeoView), IBM(Smart Analytics System), and Sun/Greenplum (Data Warehouse Appliance)
Data Management ApplianceOffloads data intensive operations from a host computer. The offloaded workload may involveoperational, specialized analytics, or archival processing. Examples of vendors here include ParAccel and Dataupia
One Purpose sole purpose issupporting data warehouse processing
One Package tested, ordered, anddelivered as a single system
One Install installed and maintainedas a single system
One Support single point of serviceprovided by a single vendor
-
8/3/2019 IBM DWH SEminario Roma 3
49/79
49 2009 IBM Corporation
Which Workload type each DWA type can handle?
-
8/3/2019 IBM DWH SEminario Roma 3
50/79
50 2009 IBM Corporation
What are the main Infrastructure Architecture?
-
8/3/2019 IBM DWH SEminario Roma 3
51/79
51 2009 IBM Corporation
What are the technological trends?
Next generation Data Warehouse Platforms
Philip Russom (TDWI Best Practice Report)
-
8/3/2019 IBM DWH SEminario Roma 3
52/79
52 2009 IBM Corporation
New and Growing Demands on the Data Warehouse
Scalability Data Explosion
Extreme Performance
Mixed workloads
Traditional complex query
Short OLTP queries
Real time load and updates
Advanced Workload management
Integrated analytics
DWA An Example:
-
8/3/2019 IBM DWH SEminario Roma 3
53/79
53 2009 IBM Corporation
Powerful Data Warehouse Warehousing Platform (ISW)
Advanced Workload Management (ISW)
System Automation (Tivoli System Automation)
Analytics Software Options Business Intelligence Capabilities (Cognos)
Cubing Services (InfoSphere Warehouse - ISW) Text Analytics & Data Mining (ISW)
Hardware & Services
Server Platform (IBM p6 or xSeries) Storage Capacity (IBM DS storage systems)
Build, Deploy, Health Check & Premium Support Services
Deeply Optimized by IBM Experts
Flexible Growth to Meet ChangingBusiness Needs
DWA- An Example:
IBM Smart Analytics SystemsTheIBM Smart Analytics Systemis the
complete analytics solution comprised of pre-tested, scalable and fully-integrated system
components of Software, Server and Storage
TheIBM Smart Analytics Systemis thecomplete analytics solution comprised of pre-
tested, scalable and fully-integrated systemcomponents of Software, Server and Storage
IBM Smart Analytics System
-
8/3/2019 IBM DWH SEminario Roma 3
54/79
54 2009 IBM Corporation
IBM Smart Analytics System
Out-of-the-box Solution
Pre-implementationSystem sizing
Pre-implementationSystem sizing
AcquireComponents
AcquireComponents
Installation
andConfiguration
Installation
andConfiguration
Testing andValidation
Testing andValidation
Build from Scratch Pre-built Solution
IBM Smart AnalyticsIBM Smart Analytics
One PackageOne PackageOne Package
One InstallOne InstallOne Install
One SupportOne SupportOne Support
All in one: software,
hardware and services
All in one: software,
hardware and services
Pre-configured
package installed on
data center floor
Pre-configured
package installed on
data center floor
One phone number to
fix your problem
One phone number to
fix your problem
-
8/3/2019 IBM DWH SEminario Roma 3
55/79
55 2009 IBM Corporation
MPP systems: Predictable Scaling
Double the data, double system resources
Each partition processes the same amount of data as before
Response times and throughput will remain constant
Double the system resources, same data Each partition processes the amount of data as before
Response times will be 2x faster, and throughput will double
Keep system resources constant, double the data
Each partition processes double the amount of data as before Response times should double, and throughput will be cut in half
Parallel Query Processing
-
8/3/2019 IBM DWH SEminario Roma 3
56/79
56 2009 IBM Corporation
Parallel Query Processing
Automatic Data Distribution
table_a
Catalog
table_b
Part1 Part2 Part3 PartN
Coord
Read A Read B
Join
Sum
Optimize
Getstatistics
A B
Join
Sum
A B
Join
Sum
A B
Join
Sum
A B
Join
Sum
sum=10 sum=12 sum=13 sum=11
connectselect sum(x) from table_a,table_b where a = b
46
sum()
Agent Agent Agent Agent
HASH (trans_id)HASH (trans_id)DISTRIBUTE BY
P di bl S li
-
8/3/2019 IBM DWH SEminario Roma 3
57/79
57 2009 IBM Corporation
Predictable Scaling
IBM Smart AnalyticsSystem
Users network
Private GigE network
Storage server
I/O Channels
SMP server SMP server
DB2Partition
DB2Partition
DB2Partition
DB2Partition
DB2Partition
DB2Partition
DB2Partition
DB2Partition
Storage server
I/O Channels
SMP server SMP server
DB2Partition
DB2Partition
DB2Partition
DB2Partition
DB2Partition
DB2Partition
DB2Partition
DB2Partition
SMP server
User ModuleUser Module
SMP server
User ModuleUser Module
T diti l L S R lt i I/O W it
-
8/3/2019 IBM DWH SEminario Roma 3
58/79
58 2009 IBM Corporation
Traditional Large Scans Result in I/O Wait
DB2 D t b P titi i F t Di id I/O
-
8/3/2019 IBM DWH SEminario Roma 3
59/79
59 2009 IBM Corporation
DB2 Database Partitioning Feature = Divide I/ODatabase Partition 1 Database Partition 2 Database Partition 3
Add R g P titi i g t F th R d I/O
-
8/3/2019 IBM DWH SEminario Roma 3
60/79
60 2009 IBM Corporation
January
February
March
Add Range Partitioning to Further Reduce I/ODatabase Partition 1 Database Partition 2 Database Partition 3
Add MDC to Further Reduce I/O
-
8/3/2019 IBM DWH SEminario Roma 3
61/79
61 2009 IBM Corporation
January
February
March
Add MDC to Further Reduce I/ODatabase Partition 1 Database Partition 2 Database Partition 3
Compression Further Reduces I/O by a Factor of 4
-
8/3/2019 IBM DWH SEminario Roma 3
62/79
62 2009 IBM Corporation
January
February
March
Compression Further Reduces I/O by a Factor of 4Database Partition 1 Database Partition 2 Database Partition 3
InfoSphere Warehouse Data Compression
-
8/3/2019 IBM DWH SEminario Roma 3
63/79
63 2009 IBM Corporation
InfoSphere Warehouse Data Compression
Compression looks for repeating patterns across the entire table
When pattern found, string replaced by a 12bit symbol
Symbols are stored in a dictionary for fast lookup
L4N5R4ONTWhitby82475500Katsopoulos
L4N5R4ONTWhitby56105510Zikopoulos
Postal_CodeProvinceCitySalaryDeptName
Dictionary
WhitbyONTL4N5R402
opoulos01
L4N5R4Katsopoulos 500 82475 Whitby ONT L4N5R4ONTWhitby56105510Zikopoulos
Kats (01) 500 82475 (02)(02) 56105510Zik (01)
Unique
to InfoSphere
Improving the Best Compression in the Industry
-
8/3/2019 IBM DWH SEminario Roma 3
64/79
64 2009 IBM Corporation
Improving the Best Compression in the Industry
Multiple algorithms for automatic index compression Unique inthe industry
Unique inthe industry
Table
Order By Order By
Temp TableTemp
Intelligent compression of large objects and XML
Automatic compression for temporary tables
Storage Savings from Compression
-
8/3/2019 IBM DWH SEminario Roma 3
65/79
65 2009 IBM Corporation
81%S
maller
79%S
maller
PRODUCTTable
SALES Table
81%
Smaller
78%
Smaller
With DB2 9, were seeingcompression rates up to 83%on the Data
Warehouse. The projectedcost savings are more than $2 million initially
with ongoing savings of $500,000 a year.- Michael Henson
Storage Savings from Compression
Performance Speedup from Compression
-
8/3/2019 IBM DWH SEminario Roma 3
66/79
66 2009 IBM Corporation
Performance Speedup from Compression
40%Faster
Workload Manager
-
8/3/2019 IBM DWH SEminario Roma 3
67/79
67 2009 IBM Corporation
Workload Manager
Identification and control of applications
Enabling Enterprise Data Warehouse Direct control of the execution environment
Tight integration with SO WLM
Detection and control of rogue queriesPrevent bad queries from executing
Query concurrency
Optimize query throughput Advanced monitoring
Real time monitoring of query execution
Workload Manager Example
-
8/3/2019 IBM DWH SEminario Roma 3
68/79
68 2009 IBM Corporation
Workload Manager Example
InfoSphere WarehouseUser Requests
System Requests
Marketingapps
Marketingmgrs
DefaultWorkload
Marketing
Managers
Default User Class
Default System Class
Tiered Approach to WLM New
-
8/3/2019 IBM DWH SEminario Roma 3
69/79
69 2009 IBM Corporation
Tiered Approach to WLM
Case Study for DILLARD'S INC
-
8/3/2019 IBM DWH SEminario Roma 3
70/79
70 2009 IBM Corporation
The Challenge
The Solution
The Benefits
Focus on four areas of its business:
Revenue growth Cost saving
Customer relationship Operational efficiency
"Now I can take markdowns by market its a 1-hour process instead of two days."
"I see winners and losers more quickly in 20minutes I have the facts!""Saves me at least 8 hours a week!""Its a competitive imperative without it, wed bebehind the eight ball!"
Dillards, Inc. (Dillards) is a major department storechain in the United States operating about 330 storesin 30 states, covering the Sunbelt and the central US.
Dillards extensively uses components ofIBMs Smart Analytics System (embedded
Mining products). Using mining analytics,Dillard's is able to obtain valuable insights into
inventory management, vendor relationshipmanagement and customer spending patterns,
which has resulted in increased efficienciesfor the company.
Customer segmentation
Market basket analysis Improve customer loyalty
Improve profitability
Provide the business insights to right
people at right time
Client quote
Examples of DILLARD'S Business Requirements
-
8/3/2019 IBM DWH SEminario Roma 3
71/79
71 2009 IBM Corporation
How to improve promotion effectiveness based on womens shoes?
?For each of these customer segments, how to discover affinitiesamong womens shoes and other items in other departments
Which products should I use for a promotion?Which products should I replenish in anticipation of a promotion?
How to characterize distinct shopping behavioral segments forcustomers who have previously purchased womens shoes
?What do my womens shoes customers look like?
Which of these customers should I target in a promotion?
How can I improve customer loyalty and customer advocates?
How to identify the items that a womens shoes customer is most likely to purchase next??
Data MiningIntelligent Miner for Data Intelligent Miner Scoring
Intelligent Miner Modeling Intelligent Miner Visualization
Data Mining Solution Process
-
8/3/2019 IBM DWH SEminario Roma 3
72/79
72 2009 IBM Corporation
SourceData
SelectedData
TransformedData
DiscoveredInformation
AssimilatedKnowledge
Select
Explore
Transform
Aggregate
Calculate
Mine
UnderstandModel Deploy
Analyze Score
Data Enhancement Model Refinement
AppliedKnowledge
Data Preparation Process Data Mining Process Deployment
Business Requirements
Validate
Y=f(x,z) AB
Measure
Data Mining Approach
-
8/3/2019 IBM DWH SEminario Roma 3
73/79
73 2009 IBM Corporation
ShoeCustomer
PurchasingBehaviorTable
ShoeCustomer
TransactionsTable
3 million customers
80+ million transactions
CustomerPurchaseHistoryTable
Average ~ 3 transactions per
customer per month
2. Create shoecustomersattributes
3. Select shoe
customerstransactions
1. Select all customers whopurchased
womens shoes inpast 12 months
MBA
Segmentation
Data Mining Approach
Shoe
customerstable
Customer Segmentation and MBA
-
8/3/2019 IBM DWH SEminario Roma 3
74/79
74 2009 IBM Corporation
Who are ourcustomers?
Customer segmentation
Market basket analysis
Who will respond todiscounting?
Which of these customers should I targetin a promotion?
Who were not classified as VIP,shopped as if they were?
Which of these
customers should Itarget in a promotion?
Which productsshould I use for
a promotion?
How to place the items withclose proximity?
What does a customer ismostly likely to purchase next?
Business Insights of Data Mining
-
8/3/2019 IBM DWH SEminario Roma 3
75/79
75 2009 IBM Corporation
Customer segmentationDillards discovered a segment of shoppers who were not classified as VIP, however,
shopped as if they were. Furthermore, this newly discovered segment made large purchases,responding to discounts more than other VIP segments, and became a targeted segment
that increased sales and profit for the company.
Traditional perception
Womens shoes draw a largepercentage of our customers
These come to Dillards onlyfor womens shoes
These are our most profitable
customers
Mining result
Certain segments of customersbuy shoes as a secondarypurchase
These cross-shop the store andare our most profitable customers
Those who purchase shoes as a
primaryor onlypurchase are not ourmost profitable customers
MBA (market basket analysis)
Bibliography
-
8/3/2019 IBM DWH SEminario Roma 3
76/79
76 2009 IBM Corporation
DataWarehouse Life Cycle
by Ralph Kimball et al. John Wiley & Sons 2008 (636 pages)ISBN:9780470149775
DataWarehouse Toolkit by Ralph Kimball and Margy Ross John Wiley & Sons 2002 (436 pages)
ISBN:9780471200246
The Anti-Architect
Ralph Kimball , article on Intelligent Enterprise, January 14, 2002
http://intelligent-enterprise.informationweek.com/020114/502warehouse1_2.jhtml
Top Ten Data Warehouse Best Practices
Nancy Kopp, IBM, Session 2162 - IBM IOD 2006 Conference
10 Mistakes to Avoid in a Business Intelligence Delivery Lalitha Chikkatur , Information Management Special Reports, September 16,
2008http://www.information-management.com/specialreports/2008_97/10001935-1.html?pg=1
Bibliography
-
8/3/2019 IBM DWH SEminario Roma 3
77/79
77 2009 IBM Corporation
What Not to Do
Ralph Kimball , article on Intelligent Enterprisehttp://intelligent-enterprise.informationweek.com/011024/416warehouse1_1.jhtml
Brave New Requirements for Data Warehousing
Ralph Kimball , article on Intelligent Enterprisehttp://intelligent-enterprise.informationweek.com/db_area/archives/1998/9810/warehouse.jhtml
Next generation Data Warehouse Platforms
Philip Russom (TDWI Best Practice Report)
Data Warehouse Appliances: Evolution or Revolution?
by Richard Hackathorn, Colin White (BeyeResearch)http://www.beyeresearch.com/study/4639
Are Data Warehouse Appliances in Your Future? Plan On It! (G00174689)
Gartner Group
Bibliography
-
8/3/2019 IBM DWH SEminario Roma 3
78/79
78 2009 IBM Corporation
Appliance Power: Crunching Data Warehousing Workloads Faster And CheaperThan Ever
James Kobielus, Forrester
Data Warehouse Architecture Best Practice and Guiding Principles (G00171980)
Gartner Group
Fundamentals of Data Warehousing for the CIO (G00167390)
Gartner Group
Changing the Dynamics of the Business with Analytics
Lou Agosta , PhD , Indipendent IT Industry Analyst
Operational BI: Expanding BI Through New, Innovative AnalyticsGoing Beyond the Traditional Data Warehouse
Claudia Imhoff, Ph.D
Powering Next Generation BI Systems
Madan Sheina, OVUM
Mixed Articles from Kimball Group Archive
http://www.ralphkimball.com/html/articles.html
Additional Bibliography
-
8/3/2019 IBM DWH SEminario Roma 3
79/79
79 2009 IBM Corporation
Building and Maintaining a Data Warehouse
by Fon Silvers Auerbach Publications 2008 (330 pages)ISBN:9781420064629
Mastering Data Warehouse Design: Relational and Dimensional Techniques
by Claudia Imhoff, Nicholas Galemmo and Jonathan G. Geiger John Wiley & Sons2003 (438 pages)ISBN:9780471324218
A Manager's Guide to Data Warehousing
by Laura L. Reeves John Wiley & Sons 2009 (480 pages)
ISBN:9780470176382 Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals
by Paulraj Ponniah John Wiley & Sons 2001 (544 pages)ISBN:9780471412540
Data Warehouse Performance
by W.H. Inmon, Ken Rudin, Christopher K. Buss and Ryan Sousa John Wiley & Sons1999 (444 pages)ISBN:9780471298083
Building the Data Warehouse, Fourth Edition
by W. H. Inmon John Wiley & Sons 2005 (574 pages)
ISBN:9780764599446