7/31/2019 20091029Session DW
1/33
Introduction to Data Warehousing
2009 IBM Corporation
Robert [email protected] for i Center of Excellence
7/31/2019 20091029Session DW
2/33
STG Technical Conferences 2009
The Agenda
Background
Turning DATA into INFORMATION
Architectures/Strategies to get you there
Introduction to Data Warehousing 2009 IBM Corporation
DB2 for i Enablers
2
7/31/2019 20091029Session DW
3/33
STG Technical Conferences 2009
Todays Reporting Requirements Remove Dependency on ITEase IT backlog of reporting requests
Reduce Report Maintenance
Empower End Users Client Independence
Web Based
Reduced Software Maintenance
Multi le Viewin O tions
Introduction to Data Warehousing 2009 IBM Corporation
Dashboards/Scorecards
Spreadsheet Integration
Board Room Quality PDF
Automated Report Distribution
E-mail Distribution Application Integration
Reporting as a function of Line of Business apps
Portal interfaces
3
7/31/2019 20091029Session DW
4/33
STG Technical Conferences 2009
What is Business Intelligence?
REPORTINGWHAT HAPPENED?
MONITOR
WHAT JUSTHAPPENED?
ANALYSISWHY DID IT HAPPEN?
PREDICTWHAT WILL HAPPEN?
Data MiningQuery/
ReportingOnLine
AnalyticsDashboards/Scorecards
Introduction to Data Warehousing 2009 IBM Corporation
OS/EAI-Operation Systems/Enterprise Application Integrations
Source: The Data Warehousing Institute, Smart Companies in the 21st Century, July 2003
Trending/OLAP Data Mining(Predictive Analytics)
Business PerformanceManagement
Historical Data (Data Warehouses/Marts) Real-Time Data (OS/EAI)
DBMS
4
7/31/2019 20091029Session DW
5/33
STG Technical Conferences 2009
Customer info ----> C file
Order header file-> O file
Order details ------> D file
DB2
Normalized OLTP Data Base
Introduction to Data Warehousing 2009 IBM Corporation
em escr p ons-> eSalesman info ----> S file
Very good design change information only in one place
O
I
S
5
7/31/2019 20091029Session DW
6/33
STG Technical Conferences 2009
Update customer information
Take an order
Record a payment
DB2
Follow a transaction
Introduction to Data Warehousing 2009 IBM Corporation
OLTP usually workswith small pieces ofthe DB
OD
I S
6
7/31/2019 20091029Session DW
7/33
STG Technical Conferences 2009
DB2
But Ask A Simple Question
Introduction to Data Warehousing 2009 IBM Corporation
Who are my best
customers?Must go through theentire customer file
OD
I S
7
7/31/2019 20091029Session DW
8/33
STG Technical Conferences 2009
DB2
Another Question
Introduction to Data Warehousing 2009 IBM Corporation
Who are my bestSalesmen?
Who are they selling to?
What are they selling?
OD
I S
8
7/31/2019 20091029Session DW
9/33
STG Technical Conferences 2009
Are you in Spreadsheet or I/T Purgatory?
Source
Systems
ERP
Annual RepQuarter1298 this is abogus report &is only for thepurpose of cre-ating an icon...
Reports
Excel
ExcelExcel
1 + 1 = 21 + 1 = 2
RekeyedDownloaded
Rekeyed 1 + 1 = 21 + 1 = 2
Rekeyed
Introduction to Data Warehousing 2009 IBM Corporation
Rekeyed
POS
Spreadsheets
Other
Sources
xce
Excel
Excel
Access Excel
Excel
1 + 1 = 31 + 1 = 3
1 + 3 = 71 + 3 = 7
2 + 1 = 1.52 + 1 = 1.5
Rekeyed
Cut & Paste
Downloaded
Uploaded
9
7/31/2019 20091029Session DW
10/33
STG Technical Conferences 2009
The most widespread technicalproblem reported by practitionerswas slow query performance.
Survey of over 2000 companies that have implemented
Introduction to Data Warehousing 2009 IBM Corporation
us ness nte gence pp cat ons
The BI Survey 8 Nigel Pendse,
10
7/31/2019 20091029Session DW
11/33
STG Technical Conferences 2009
Managing the Querying of Production Data
Shield report authors and end users from complexities of the database
Leverage a META DATA oriented Query Tool (ex: DB2 Web Query)
Define data relationships, standardize/simplify data meanings
Optimize the environment
Ensure a PROACTIVE or REACTIVE indexing strategy is in place
Proactive
Read Indexing and Statistics White Paper at: http://www-03.ibm.com/servers/enable/site/bi/strategy/ind ex.html
Reactive
Introduction to Data Warehousing 2009 IBM Corporation
Get to (at a minimum) V5R4
Minimize Impact on Production Systems
Isolate query workloads through dedicated subsystems/pools for Query jobs
Be wary of autotuner impact on queries
Leverage Query Governor (QQRYTIMLMT) with time or disk space (V5R4) governing
Get Some Assistance
IBM Lab Services SQL/Query Performance Assessment service
ibm.com/systems/i/editions/services.htm
11
7/31/2019 20091029Session DW
12/33
STG Technical Conferences 2009
Isolating Production Systems with Logical Replication
H/A Solution
Production H/A Backup
DB2Mirrored
Image
ODS Data Warehouse
Introduction to Data Warehousing 2009 IBM Corporation
Queries againstProduction Databases
Queries againstData Warehouse/Marts
I/T Optimization through Combined H/A and BI Server
Leverage H/A software to create Operational Data Store (ODS) in near real time
Utilize ODS as the source for ETL processes into the Data Warehouse
Combine with target side remote journaling for ETL efficiencies
No impact to Production Databases
Utilize mostly idle capacity of H/A Server for Data Warehouse Workloads
Optionally mirror Data Warehouse
12
7/31/2019 20091029Session DW
13/33
STG Technical Conferences 2009
Common data Challenges
Data errors failed joins
invalid dates missing values
Introduction to Data Warehousing 2009 IBM Corporation
Hidden meanings and conditional rules 2nd character of column X means ..
if column Y = S, value Z must be multiplied by -1
If record type is 1, there mustbe a matching record in table B.
If type is 2, there maybe a record.
If type is 3 there should notbe a record.
For data older than 2/11/2003, column X will be blank but it must be a valid valuefrom then on.
13
7/31/2019 20091029Session DW
14/33
7/31/2019 20091029Session DW
15/33
STG Technical Conferences 2009
Source 1
Personal Name Address Information
Bob Christiansan 416 Columbus Ave #2, Boston, Massachusetts 02116Kate A. Roberts 4 New York Plaza Floor 23, Manhattan NY, 10036
James Trenton 125-A Washington, Los Angeles, CA 90066
Robert Christiansen Four sixteen Columbus Avenue APT2, Boston, Mass 02116
Common data Challenges
Introduction to Data Warehousing 2009 IBM Corporation
Unlimited formats, structures & attributes
Source 2
Source 3
Katherine Roberts Four NY Plaza, FL-23, New York New York, 10036Trenton, James 125 Washington Unit A, LA, California, 90066
R.J. Christensen 416 Columbus Suite #2, Suffolk County 02116
Mrs. K. Roberts 4 NY Plaza, LVL23, NYC 10036Mr & Mrs J.Trenton One-twenty-five Washington #A, Los Angeles Cnty 90066
15
7/31/2019 20091029Session DW
16/33
STG Technical Conferences 2009
The Enterprise Data Warehouse Architecture
Data Propagation
Operational System(s)
Extraction, Transformation and Loading
l
Data Staging Area
Cleansed,TransformedData
Introduction to Data Warehousing 2009 IBM Corporation
SalesFinance
DataMart
DataMart
DataMart
Mfg
Tacti
calo
peration
decis
ionsupp
ort
PC or Browser Web Visualization Products
OLAPApplications
16
7/31/2019 20091029Session DW
17/33
STG Technical Conferences 2009
Reasons you may choose a data warehouse
Manage larger (Terabyte?) volumes of data
Add data from sources other than production systems
Ex: purchased demographic data
Non IBM i databases
Clean/Transform the data
An ODS does not solve a lot of data issues
Introduction to Data Warehousing 2009 IBM Corporation
Tuning AspectsSeparate server/partition allows for different tuning knobs to be turned
May be a different allocation of resources to manage this very different workload
Separation of Powers
Data Warehouse Team versus Operational Systems TeamSeparate Decisions
OS or resource upgrades
Single Version of the Truth
17
7/31/2019 20091029Session DW
18/33
STG Technical Conferences 2009
E.T.L.
Extract data from somewhere
(may be MANY sources)
Transform it somehow
Introduction to Data Warehousing 2009 IBM Corporation
Load it somewhere else(and load it FAST)
18
7/31/2019 20091029Session DW
19/33
STG Technical Conferences 2009
CUSTNO CUSTNAME
1001 John Smith1002 Mary Jones
1003 Chris Anderson
1004 David Perry
Customer File - US
CUSTNO CUSTNAME
1001 Harry Potter1002 Jeremy Carr
1003 Penny Hayes
1004 Debbie Thornton
Customer File - Canada
Transformation Example: Surrogate Keys
Introduction to Data Warehousing 2009 IBM Corporation
Surrogate key is asequential number
with no correlation toreplaced value(s)
CUSTNUMBER CUSTNAME REGION OLDNUM
1 John Smith US 1001
2 Mary Jones US 1002
3 Chris Anderson US 1003
4 David Perry US 1004
5 Harry Potter CANADA 1001
6 Jeremy Carr CANADA 1002
7 Penny Hayes CANADA 1003
8 Debbie Thornton CANADA 1004
Customer File - Data Warehouse
PKSecondary Index
19
7/31/2019 20091029Session DW
20/33
STG Technical Conferences 2009
Show me the date, weather, andquantity/revenue from sales ofumbrellas, raingear, and hats in ourFlorida stores in November, and
Transformation Example: Star Schema
Itemkey
Itemkey
Storekey
Item_Dim keylist
Store_Dim keylist
DIMENSIONS
FACT
Introduction to Data Warehousing 2009 IBM Corporation
or er y s ore, em, a e, en
weather
SalesQuantity
Datekey
Storekey
Datekey
Date_Dim keylistSelect store, item, date, weather, sum(sales), sum(quantity)
from item_dim, store_dim, date_dim, fact_table
where itemkey in (...keylist...) and storekey in (...keylist...)
and datekey in (...keylist...)and itemkey=itemkey, storekey=storekey, datekey=datekey
group by store, item, date, weather
20
7/31/2019 20091029Session DW
21/33
STG Technical Conferences 2009
E.T.L.
But.. There are two VITAL additional requirements
Validate bad data in is bad data out
Manage what do you do with bad data ? how do you administer ETL jobs?
Introduction to Data Warehousing 2009 IBM Corporation
Validate
Transform
Manage
21
7/31/2019 20091029Session DW
22/33
STG Technical Conferences 2009
ETL Alternatives
Do it yourself
You almost always end up looking at tools later
If you do, consider use of SQL!
ETL lite: IBM i based Information Builders Data Migrator
www.ibi.com
Coglin Mills Rodin DB2 Web Query Edition
Introduction to Data Warehousing 2009 IBM Corporation
www.coglinmill.com
Talend Open Source
www.talend.com
High End (AIX Partition on Power Systems)
IBM InfoSphere Information Server
22
7/31/2019 20091029Session DW
23/33
STG Technical Conferences 2009
DB2 for i DW Near Real Time Architecture
DB2 for i
.25 CPUs
DWStaging
AreaDB2 for i
3.75 CPUs
DB2DW
ERP
IBM i LPAR
4 CPUs
Remote Journaling
ShippedLogs
Data Mirror
StagedData
Or ODS
ETL Tool
Introduction to Data Warehousing 2009 IBM Corporation
Remote Journaling during normal business processing hours Trickle Feed Staging Area/ODS
Eliminate EXTRACTION impact on production systems
No Charge Feature of IBM i
Requires Program (e.g., DataMirror) to read data from journal receivers
Can add SQL logic to remove unwanted fields, change datatypes,
Virtualization Engine Technologies
Optimize resources for supporting production and daytime data warehouse queries
High speed data transfers over Virtual Ethernet
Common Backup and other Shared I/O
23
7/31/2019 20091029Session DW
24/33
STG Technical Conferences 2009
On Line Analytical Processing (OLAP)
OLAP is INTERACTIVE and ITERATIVE
Query is usually batch, list oriented result sets
Accessing business data with numerous dimensions 'anything' by'anything' by'anything' analysis
data can be easily analyzed from many different viewpoints
data is modeled to the business
Introduction to Data Warehousing 2009 IBM Corporation
data is viewed across, down and through the various dimensions
Helps answer business questions
How are my different departments performing?
Is this pattern the same every year?
Can we look at the information another way?
24
7/31/2019 20091029Session DW
25/33
STG Technical Conferences 2009
OLAP is uniquely suited to handle applicationssuch as:
Budgeting Planning Forecasting Business Modeling
Introduction to Data Warehousing 2009 IBM Corporation
Financial Consolidation
Sales & Performance Analysis Customer & Product Profitability
25
7/31/2019 20091029Session DW
26/33
STG Technical Conferences 2009
What is the right OLAPTechnology?
BI ToolBI Tool ApplicationApplication BI ToolBI Tool
SQL 3
SQL 2
SQL 1
Relational
Data
Data
Load
Introduction to Data Warehousing 2009 IBM Corporation
MOLAP ROLAP
# of users Many Few
engine Cubing Engine Query Optimization
architecture Depends DBMS Backend
via complex loading complex SQL
metadata in engine Meta Data Layer
ExamplesESSBASE,
InfoManagerDB2 Web Query
(Olap option)
speed of thought Will vary
data strategySummary with drillthrough to detail
Summary or Detail
26
7/31/2019 20091029Session DW
27/33
STG Technical Conferences 2009
DB2 for i Enablers for Data Warehousing
POWER6 Processors SQL Query Engine (SQE)
Self Learning, Self Adapting
Database Parallelism* Real time statistics Materialized Query Tables Star Join Query Rewrite
60,000
80,000
100,000
120,000
140,000
57% Improvement
79% Improvement
Introduction to Data Warehousing 2009 IBM Corporation
Encoded Vector Indexing Remote Journaling (Trickle Feed) Single Level Storage Autonomic Indexes Index Advisor
0
20,000
40,000
2w i520 2w 520 4w i570 4w 570 8w i570
POWER5+ POWER6 (V6R1) POWER6 (v5r4)
*See detailed certified benchmark results athttp://www.sap.com/solutions/benchmark/bid_results.htm
27
7/31/2019 20091029Session DW
28/33
STG Technical Conferences 2009
Indexing technology that can significantly improve performance, especially for star schema
10% to 30% faster index builds
1/3 to 1/16 the size
1/2 the time for index scans
1/3 the time for bit map generation
Symbol Table
Key Value Code First Last Count
BI Acceleration with Encoded Vector Indexing
Introduction to Data Warehousing 2009 IBM Corporation
Vector1 13 12 28 2 17 38 2 26 33
Row 1 Row 2 ....
Row Row
Arizona 1 1 80005 5000Arkansas 2 5 99760 7300
......
Virginia 37 1222 30111 340
Wyoming 38 7 83000 2760
EVIs now part of Index Advice!!!
28
7/31/2019 20091029Session DW
29/33
STG Technical Conferences 2009
IBM DB2 Web Query for System i Powered By Information Builders
Base Program Product Includes:
IBM i Web Reporting Server
Several Web Based authoring tools
RA, GA, Power Painter
Query/400 (5722-QU1) Web Enable Query/400 Reports
BASE PRODUCT OFFERED AS NOCHARGE UPGRADE FROM QU1
Introduction to Data Warehousing 2009 IBM Corporation
Does not include Software Maintenance
Additional Features
Run Time User Enablement
Active Reports (Disconnected Analysis)
On Line Analytical Processing Requires Meta Data provided with Developer
Workbench
Developer Workbench
IT Tool for meta data http://www.ibm.com/systems/i/db2/webquery
DB2 Web Query Report Broker
Automated Report Execution andDistribution
DB2 Web Query SDK
Web Services to integrate reportingfunctions into applications/portals
29
7/31/2019 20091029Session DW
30/33
STG Technical Conferences 2009
Automated Delivery Of Information
On Scheduled Basis
Through Admin GUI
Daily, Weekly, Specific Days, exclude rules
On Event Basis
Some customization required
Intelligent bursting
Ex: Regional Sales Report
DB2 Web Query Report Broker 5733-QU3
Introduction to Data Warehousing 2009 IBM Corporation
Additional output formats for batch reporting
(HTML, PDF, Excel, Active HTML)
Delivery Destinations
Printer
Save the reports for later viewing
Notify Function
Send notification when report is complete or fails
Requires DB2 Web Query BASE Product to be installed
30
7/31/2019 20091029Session DW
31/33
STG Technical Conferences 2009
New in 2009: Microsoft Integration
Spreadsheet Client
Improve the experience for Excel Users
Excel Plug In
Embed queries in Excel templates
SQL Server Adapter
Extend the reach of DB2 Web Query
Introduction to Data Warehousing 2009 IBM Corporation
databases with a single adapter
31
7/31/2019 20091029Session DW
32/33
STG Technical Conferences 2009
Introduction to Data Warehousing 2009 IBM Corporation32
7/31/2019 20091029Session DW
33/33
Top Related