Data Warehousing on System z What is available & How to ... · PDF fileData Warehousing on...

48
© 2008 IBM Corporation IBM Systems Data Warehousing on System z What is available & How to implement Expanding System z’s Role in Data Warehouse & Business Intelligence Implementations

Transcript of Data Warehousing on System z What is available & How to ... · PDF fileData Warehousing on...

© 2008 IBM Corporation IBM Systems

Data Warehousing on System zWhat is available & How to implement

Expanding System z’s Role in Data Warehouse & Business Intelligence Implementations

Agenda

• Market Trends:From Business Automation to Business Optimization

• Why key markets trends are playing off for System z:Positioning System z strengths for Business Intelligence and Data Serving

• What is available on System z & How to implement

• Summary

Agenda

New Initiatives Increasingly Focus on OptimizationOrganizations Striving for Competitive Advantage

3

BusinessOptimization

InformationAgenda

BusinessAutomation

ApplicationAgenda

Faster Processing, Reduced Costs

Competitive AdvantageBusiness Optimization Growth is 2 Times Faster thanBusiness Automation

$594B5.1% CGR

Call CenterOperations

ERP & Financials

Supply ChainManagement

CustomerProfitability

Financial Risk Insight

5.1% CGR$594B

IT Spending Estimate*

2008

Dynamic DemandPlanning

Leveraging Information to Create Business Value

OLAP & Data Mining

−Merchandising, Inventory,

Operations

Manage and optimizedaily business operations�Many operational users�Unpredictable workload�Mission critical

Manage tactical initiatives to achieve strategic goals�What if analysis�Flexibility

Information On Demand

−Optimize Each Transaction

−Call Centers, Field Ops

Query & Reporting

− Financials, Sales

After the fact analysis�Limitted number of users�Scheduled and predicted workload

Customer Scenarios for Operational BI

Retail – inventory planningInform mangers how products sell

on an hourly base

Retail – inventory planningInform mangers how products sell

on an hourly base

Insurance – Claim processingInform call center agents about

fraudulent claim probability

Insurance – Claim processingInform call center agents about

fraudulent claim probability

Customer ServiceInform call center agents about actual life time value of customer

Customer ServiceInform call center agents about actual life time value of customer

Logistics – Truck fleet planningInform admin about possible

business impact of trucks being late

Logistics – Truck fleet planningInform admin about possible

business impact of trucks being late

Credit Card Service - Card identificationInform OLTP system about

possibly stolen cards, dependent on OLTP transaction profile

Credit Card Service - Card identificationInform OLTP system about

possibly stolen cards, dependent on OLTP transaction profile

Banking – Customer ServiceInform call center agents on

possible cross selling opportunitiesdependent on actual transaction

Banking – Customer ServiceInform call center agents on

possible cross selling opportunitiesdependent on actual transaction

Why key markets trends are playing off for System z:Positioning System z strengths for Business

Intelligence and Data Serving

Key questions for your DW Platform Design

1. Which DBMS to use for Datawarehousing

2. Where to place your Datawarehouse

3. How to solve the biggest challenge for dynamic DW: the mixed workload performancechallenge

4. Plan for growth, the requirement for Scalability

5. TCO for your DW

1. Which DBMS to use for Datawarehousing

“The transactional DBMSs have an edge

that challenges the DW DBMSs (such as Teradata)”Gartner Data Warehouse Magic Quadrant, 2006

Transactional

DBMS

DW

DBMS

Advanced Data Partioning

MQTs

Cubing Services2008

1970s IBM Information Management, 2008

2. Where to place your Datawarehouse

The larger the amount of data, the more likely it is to deploy operational BI centra lly.

Ventana Research 2007

A centralized approach can incorporate Information management to consolidate

• data integration • data transformation• metadata management

rather than have these information processes handled differently by each department or line of business.

2. Where to place your Datawarehouse: DW on z Solution Architecture

Member A

OLTP

CEC One

Member B

OLTP

CEC Two

DB2 DataSharingGroup

CFCF

CEC One CEC TwoCFCF

DB2 DataSharingGroupMember C

DWHMember D

DWH

Within a data sharing environment, the data warehouse resides in the same group as the transactional data.

The mixed workload performance will become the single mostimportant performance issue in DW

3. How to solve the biggest challenge for dynamic DW: the mixed workload performance challenge

The emergence of issues based on a mix of 4 DW work loads– Continous (near-real-time) data loading – similar to an OLTP

workload – Large numbers of standard reports – An incresing number of true ad hoc query users – An increasing level of analytics and BI-oriented functionality in

OLTP

Information On Demand

to Optimize Real-Time Processes

Dynamic Warehousing +

Operational BI

Gartner Data Warehouse Magic Quadrant, 2006

Traditional workload management approach

–Screen queries before they start execution

–Time consuming for DBAs.

–Some large queries slip through the crack.

–Running these queries degrade system performance.

–Cancellation of the queries wastes CPU cycles.

Workload Management

The ideal workload manager policy for data warehousing:

Consistent favoring ofshorter running work........

with select favoring of criticalbusiness users

keep em shortthrough WLM period aging

through WLM explicit prioritization of critical users

Workload Management

discretionary5

43010000004

3401000003

260500002

28050001

ImportanceVelocityDurationPeriod

Period aging can be used to lower the priorities of a query execution after it exceeds a specific amount of CPU time. This way, short running queries run with higher priority while long running queries are not blocking important resources.

Workload Management assigned to Service classes

service class A velocity 50 importance 3

service class B velocity 100 importance 1

1 hour to complete

5 minutes to complete

Some queries (the “CEO” queries) still need to return in a minimum amount of time. Their priority can override the period aging through their service class assignments.

3. How to solve the biggest challenge for dynamic DW: the mixed workload performance challenge

The mixed workload performance will become the single most important performance issue in DW

Information On Demand

to Optimize Real-Time Processes

Dynamic Warehousing

Gartner Data Warehouse Magic Quadrant, 2006

Consistent favoring ofshorter running work........

keep em shortthrough System z WLM period aging

The ideal workload manager Policy for data warehousing

4.Plan for growth, the requirement for Scalability: Scalability with DB2: Single Query Parallelism

CPU Intensive Queries

10 CPU

20 CPU

20 CPU

0

500

1,000

1,500

2,000T

ime

in s

eco

nd

s

92% scalability

1874s

1020s

937s

PerfectScalability

Sysplex

measure

d

SMP

4. Plan for growth, the requirement for Scalability: Scalability with DB2: Multiuser throughput

Query Throughput

0

100

200

300

400

500

600

700

293

551

C

ompl

etio

ns @

3 H

ours

94% scalability

(Sysplex)10 CPU

20 CPU

(30 users)

(60 users)

S M P

5.TCO for your DW

When system z is the lowest cost platform

… just some pricing aspects for DWing

Mainframe Cost Per Unit of Work Goes Down as Workload Increases

Data Center Workload

Cos

t per

uni

t of w

ork

Mainframe

Distributed scale out

Most TCO benchmarks compare single applications

Most businesses operate here

Worldwide IT spending trend is playing off for system z

Source: IDC, Virtualization 2.0: The Next Phase in Customer Adoption, Doc #204904, Dec 2006

$0

$50

$100

$150

$200

$250

$300

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

Installed Base(M Units)

Spending(US$B)

0

5

10

15

20

25

30

35

40

45

50

1996

Power and cooling costs

Server mgmt and admin costs

New server spending

Worldwide IT Spending TrendWorldwide IT Spending Trend

$0

$50

$100

$150

$200

$250

$300

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

Installed Base(M Units)

Spending(US$B)

0

5

10

15

20

25

30

35

40

45

50

1996

$0

$50

$100

$150

$200

$250

$300

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

Installed Base(M Units)

Spending(US$B)

0

5

10

15

20

25

30

35

40

45

50

1996

$0

$50

$100

$150

$200

$250

$300

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

Installed Base(M Units)

Spending(US$B)

0

5

10

15

20

25

30

35

40

45

50

1996

Power and cooling costs

Server mgmt and admin costs

New server spending

Power and cooling costs

Server mgmt and admin costs

New server spending

Worldwide IT Spending TrendWorldwide IT Spending Trend

2000 – Raw processing Raw processing ““horsepowerhorsepower”” is the primary is the primary goal, while the infrastructure goal, while the infrastructure to support it is assumed to support it is assumed readyready

2006 –– Raw processing Raw processing ““horsepowerhorsepower”” is a given, but is a given, but the infrastructure to support the infrastructure to support deployment is a limiting factordeployment is a limiting factor

Power and cooling spend may eventually exceed new s erver spending

Management/Administration costs are already exceedi ng new server spending

Technology Evolution with Mainframe Specialty Engines

Internal Coupling Facility (ICF) 1997

Integrated Facility for Linux (IFL) 2001

IBM System z9 Integrated Information Processor (IBM zIIP) 2006

System z9 Application Assist Processor (zAAP) 2004

�Building on a strong track record of technology innovation with specialty engines, IBM is introducing the System z9 Integrated Information Processor

�Support for new workloads and open standards

� Designed to help improve resource optimization for eligible data workloads within the enterprise

� Centralized data sharing across

mainframes

� Incorporation of JAVA into existing mainframe solutions

• Portions of the following DB2 for z/OS V8 workloads may benefit from zIIP*:

2 - Data warehousing applications*• Requests that utilize parallel queries

3 - DB2 for z/OS V8 utilities LOAD, REORG & REBUILD*• DB2 utility functions used to maintain index maintenance structures

1 - ERP, CRM, Business Intelligence or other enterprise applications• Via DRDA over a TCP/IP connection (enclave SRBs, not stored procedures or UDFs)

* The zIIP is designed so that a program can work w ith z/OS to have all or a portion of its enclave Se rvice Request Block (SRB) work directed to the zIIP. The above t ypes of DB2 V8/9 work are those executing in enclav e SRBs, of which portions can be sent to the zIIP.

DB2 V8 exploitation of IBM zIIP value add

New Specialty Engine

Measured zIIP offload for a DW workload

0 10 20 30 40 50 60 70 80 90

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

% Offload to zIIP

… last not least

Hardware-assisted data compression

0

20

40

60

80

100

Com

pressed Ratio

Compressed Non-compressed

53%46%

61%

I/O Intensive CPU Intensive

281

157

283

200

354

412

Elapsed tim

e (sec)

Non-Compressed Compressed 60%I/O Wait I/O WaitCPU

CPU

Compress CPUOverhead

Effects of Compression on

Elapsed Time

Compression Ratios

Achieved

In V9Indexes

Can also be compressed

Summary of why DW on z

zLinux z/OS

DWH z SolutionDWH z Solution

Why DW on z

• Requirements for DWH / BI users are changing :– there is an increasing demand for accessing warehouse data in

a near-real-time way– Business intelligence is becoming integrated into operational

processes – Data security, availability requirements are increasing – Large groups of users – Workload management capabilities of z/OS allow optimization

of system performance and prioritization of large volumes of queries, and workloads (OLTP, DWing, batch – ETL)

– Linux on System z offers brings software options to system z– The ETL process to bring operational data into a DWH has to

be performed more often and becomes more and more time critical

What is available on System z & How to implement

Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture

DB2 DataSharingGroup

WebSphere Classic Federation

LegacyDataSource

LegacyDataSource

WebSphere Classic Data Ev entPublisher

DataStage

Quality Stage

MetaData

Inf ormationServ er

Insert/Update

Extract/Query

SOA Serv er

WebSphere MQ

z/OS zLinux/xLinux/AIX...

ODBC

Analy tics andReporting

BI Query Workload

Data warehouse, Transf ormationand Deploy ment

Fed

erat

ion

and

Cha

nge C

aptu

re

Best Practises: Solution Architecture on System z

Member A

OLTP

CEC One

Member B

OLTP

CEC Two

DB2 DataSharingGroup

CFCF

CEC One CEC TwoCFCF

DB2 DataSharingGroupMember C

DWHMember D

DWH

Within a data sharing environment, the data warehouse resides in the same group as the transactional data.

Best Practises: Initial load of the Data Warehouse

Member A

OLTP

Member B

OLTP

CEC One CEC TwoCFCF

Member C

DWH

Member D

DWH

ETL Accelerator

Extract

Load

Software AGAdabas

VSAM,IAM &

sequential

CA IDMS

CA Datacom

IMS

Classic Federation Server

z/OS

DB2 UDBfor LUW

Oracle SQL Server

Distributed Data Sources- Integration -The data is extracted from the OLTP information of the data

sharing group, transformed by DataStage and then loaded into the data warehouse tables again.

Legacy data sources are integrated through the Classic Federation Server. This way data is extracted from DataStagedirectly out of the data sources like IMS, VSAM…

Distributed data sources are directly integrated through the IBM Information server.

Best Practises: At runtime, the DWH is updated incrementally

Member B

OLTP

CEC One CEC TwoCFCF

Member C

DWH

Member D

DWH

DB2

VSAMSoftware AGAdabas

CaptureIMS

Data Event Publisher

CA-IDMS

z/OS

DB2 UDBfor LUW

Oracle

Data Event Publisher

Capture

Distributed

MQ

MQ

MQ

Update & Insert

Once, the data warehouse is loaded, only incremental updates are performed. The changes on the original data sources (z/OS and distributed) are captured by WebSphere Data Event Publisher and sent through MQ to the ETL Accelerator. Special DataStage stages take the change information from MQ and updates or inserts are performed on the data warehouse data. This might of course be delayed to batch windows by just receiving the changes and only updating the warehouse at a later point in time.

Member A

OLTP

Cha

nge

Cap

ture

ETL Accelerator

Best Practises: In Database ELT is triggered by DataStage

Member A

OLTP

Member B

OLTP

CEC One CEC TwoCFCF

Member C

DWH

Member D

DWH

Simple example:

-- Aggregate by salary by department into AGGRSALARY

INSERT INTO AGGRSALARY ( DEPTCODE, AVGBAND, AVGSALARY )SELECT DEPTCODE, AVG( BAND ) AS AVGBAND, AVG( SALARY ) AS AVGSALARY FROM STAFFGROUP BY DEPTCODE

Wherever possible, “In Database” transformations (ELT) are used to spare the transport of the data to the accelerator. But the used SQL is still sent from the ETL Accelerator to the database to have one place of documentation for all ETL steps. This can also be used to shift the data up the hierarchy within the Layered Data Architecutre.

SQLELT

ETL Accelerator

Summary

Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture

DB2 DataSharingGroup

WebSphere Classic Federation

LegacyDataSource

LegacyDataSource

WebSphere Classic Data Ev entPublisher

DataStage

Quality Stage

MetaData

Inf ormationServ er

Insert/Update

Extract/Query

SOA Serv er

WebSphere MQ

z/OS zLinux/xLinux/AIX...

ODBC

Analy tics andReporting

BI Query Workload

Data warehouse, Transf ormationand Deploy ment

Fed

erat

ion

and

Cha

nge C

aptu

re

DB2 for z/OS features that support DWHing

• 64-Bit Addressability

• 2000 byte index keys

• MQT’s

• Multi-Row Operations

• 225 way table joins

• In-Memory Workfiles

• Automatic Space Allocation

• Non-Uniform Distribution Statistics on Non-Indexed Columns

• Parallel Sorting

• Data Partitioned Secondary Indexes

• 2MB SQL Statements

• Etc.

• Partition by growth

• Index Compression

• Dynamic index ANDing

• New row internal structure for faster VARCHAR processing

• Fast delete of all the rows in a partition

• Deleting first n rows

• Skipping uncommitted inserted/updated qualifying rows

• Etc.

A whitepaper can be downloaded from: http://www.ibm.com/software/sw-library/en_US/detail/A016040Z53841K98.html

Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture

DB2 DataSharingGroup

WebSphere Classic Federation

LegacyDataSource

LegacyDataSource

WebSphere Classic Data Ev entPublisher

DataStage

Quality Stage

MetaData

Inf ormationServ er

Insert/Update

Extract/Query

SOA Serv er

WebSphere MQ

z/OS zLinux/xLinux/AIX...

ODBC

Analy tics andReporting

BI Query Workload

Data warehouse, Transf ormationand Deploy ment

Fed

erat

ion

and

Cha

nge C

aptu

re

Unix, Linux, Wintel, zLinux*

Reports

Websphere

Information Service Framework

Logging

Authorization

ConnectorAccess

Services

Scheduling

Metadata

Reporting AccessAnalysis

WAS

* Q4, 2007

DB2

Metadata

PX Engine

DataStage Server

DSEngine

Data Event Publisher

Classic FederationServer

SEQ. VSAM IDMS Adabas Datacom

WebSphereMQ

DB2

SQL

SFTP or Batch Pipes

z/OS

IMS

z/OSConnectors

Non-z/OSConnectors

Oracle, SQL Server, …

DS,QS ClientWeb Console

DS,QS

IA Client

IA WISD

WISD Client

ADMIN, BG,

Reports

Meta Brokers

Windows

MetadataWorkbench

� Rich connectivity with:� Application & distributed data� DB2 for z/OS� Classic Federation for more z-data� Data Event Publishers for CDC� Optional DB2 for z/OS repository

� Continued native z/OS deployment options

� Common reusable services framework leverages the power of a SOA environment

� Meta data repository promotes:� reuse� compliance to standards� visual lineage� impact analysis

PX Engine(USS)

IMS

VSAM

SEQ.

z/OS Operators

New Linux for z deployment option� Robust, parallel processing� Hipersocket connectivity to z data� Full Information Server suite:

QualityStage, Information Analyzer…� Minimal impact on z/OS costs:

Leverages IFLs and zIIPs

Run Time Library DSMVS DSMVS Jobs

DS MVS UPLOADCOBOL,

andJCL

Metadata

DB2

Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture

DB2 DataSharingGroup

WebSphere Classic Federation

LegacyDataSource

LegacyDataSource

WebSphere Classic Data Ev entPublisher

DataStage

Quality Stage

MetaData

Inf ormationServ er

Insert/Update

Extract/Query

SOA Serv er

WebSphere MQ

z/OS zLinux/xLinux/AIX...

ODBC

Analy tics andReporting

BI Query Workload

Data warehouse, Transf ormationand Deploy ment

Fed

erat

ion

and

Cha

nge C

aptu

re

Read-from and write-to mainframe data sources using SQL from Unix, Windows, Linux and JVM platforms

• Standard ODBC/JDBC SQL interfaces– ODBC 3.5.1, JDBC 2.1 (some 3.0) type 4

• Metadata-driven means:– No mainframe programming

required Fast installation &– Configuration Ease of maintenance

• Works with existing and new:– Mainframe infrastructure– Application infrastructure– Toolsets

• Deliver mainframe data to– Self-service portals (real-time account details)– e-commerce solutions (real-time inventory)– Reporting and analytical tools, e.g. Business Objects– Data transformation and cleansing (e.g. DataStage, QualityStage)

for data warehousing, ODS, MDM, CDI, …

WebSphere Classic Federation Server for z/OSThe basics

DB2 UDBfor z/OS

Software AGAdabas

VSAM,IAM &

sequential

CA IDMS

CA Datacom

IMS

Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture

DB2 DataSharingGroup

WebSphere Classic Federation

LegacyDataSource

LegacyDataSource

WebSphere Classic Data Ev entPublisher

DataStage

Quality Stage

MetaData

Inf ormationServ er

Insert/Update

Extract/Query

SOA Serv er

WebSphere MQ

z/OS zLinux/xLinux/AIX...

ODBC

Analy tics andReporting

BI Query Workload

Data warehouse, Transf ormationand Deploy ment

Fed

erat

ion

and

Cha

nge C

aptu

re

WebSphere Classic Data Event Publisher for z/OS

• Real-time &/or background capture & publishing of data changes for:– VSAM files

– IMS databases

– CA-IDMS databases

– Software AG Adabas databases

• IBM owns multiple Event Publishing infrastructures:– WebSphere Data Event Publisher (DB2 on LUW and z/OS)

– WebSphere Classic Data Event Publisher (VSAM, IMS, …)

– WebSphere Business Event Publisher (focus is on PROCESS, some data)

VSAMchanges

IMS

changes

DB2 UDB

for z/OS changes

changes changes

CA-IDMS

changes

WebSphere MQ

AdabasChanges

Data Event Publisher Classic Data Event Publisher

Enterprise Data Warehouse using DB2 for z/OSThe Complete Solution Architecture

DB2 DataSharingGroup

WebSphere Classic Federation

LegacyDataSource

LegacyDataSource

WebSphere Classic Data Ev entPublisher

DataStage

Quality Stage

MetaData

Inf ormationServ er

Insert/Update

Extract/Query

SOA Serv er

WebSphere MQ

z/OS zLinux/xLinux/AIX...

ODBC

Analy tics andReporting

BI Query Workload

Data warehouse, Transf ormationand Deploy ment

Fed

erat

ion

and

Cha

nge C

aptu

re

Available since March 2007: IBM DataQuant• Adds compelling new Warehouse/Business Intelligence component to WH on z

– Visual Dashboards, Enhanced Graphical Reporting & Analytics, Security and Personalization, SOA Layer

– Adds over 100 analytical functions to queries, reports and visual solutions• Extends page-based QMF ‘visual reports’ with a comprehensive array of charts

and graphical objects• Transforms text-based QMF reports into highly graphical, information-rich

documents

Analytics with DB2 Alphablox

• Provide reporting and dashboard capabilities on key indicators

• Leverage operations such as ranking, ordering, filtering, trending, and other sophisticated statistical functions and calculations

• Drive data analysis from multiple data sources, both relational and multidimensional

Summary

DataWarehousing on system z – IBM at your help

• Whitepaper : Why Data Warehousing on System zavailable in the WEB http://www-306.ibm.com/software/data/db2bi/systemz.html

• DW on system z – Demo available in the Technical Marketing Competence Center, Böblingen, Germany, [email protected]

• DW on system z – Customer Briefings possible in the Technical Marketing Competence Center, Böblingen,[email protected]

• DW on system z – ‘Redbook ’ available