The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to...

35
® IBM Software Group © 2005 IBM Corporation The Essential Guide to Accessing, Consolidating and Trusting Your Data

Transcript of The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to...

Page 1: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

®

IBM Software Group

© 2005 IBM Corporation

The Essential Guide toAccessing, Consolidating and TrustingYour Data

Page 2: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Quiz Time

What do all these companies have in common?

US$10 billion Retailer migrating and consolidating financial data into Oracle Financials

Reduced projected 2,700-day manual effort to 217 days

Saved US$2 million

US$45 billion Manufacturer consolidating more than 3,300 legacy software applications

down to 400 while reducing IT staff by 50%

US$4.5 billion global Chemicals Company consolidating 13 SAP instances into 1 global instance

Would save US$37 million in annual operating costs

Page 3: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Quiz Time: Answers

Raw, disparate data and disconnected systems

Enterprise Data Integration

Business Results that drive revenue and lower costs

Happened despite pouring hundreds of millions of $ into

new ERP, CRM, SCB, BI, BPM and DW systems

Page 4: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Other Companies

Page 5: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Challenges in Data Management

Touching data multiple times at its source –storing multiple times and updating multiple times

Inability to share common business rules across projects, processes and applications

Inconsistent islands of information underlying applications

Complex, manual & costly copy synchronizationInconsistent and poor quality dataInability to exploit enterprise meta data across tools

Lack single, repeatable methodology for consistency across all projects

CRM Order Proc

SupplyChain

Procure-ment

Page 6: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Remedy: 10 Proven Strategies

No single path is THE panacea to all corporate data problems - multiple approaches must

be employed

Consider where your organization’s most SIGNIFICANT data pain exists – take that

approach first

Page 7: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #1 – Understand Source Systems

Business Analysis

Data Analysis

1. Discovers actual characteristics of data

2. Verify if characteristics of data conform to established / known business rules

3. Report on the assessment and variances / exceptions

Page 8: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #1 – Understand Source Systems

Why this Strategy is #1 on a list of 10?K. Strange/T. Friedman – Gartner Group Research (2/28/2002)

“Complete understanding of data and awareness of data quality issues in operational systems is a critical success factor in any data integration or conversion effort.”

The Standish Group – Migrate Headaches (Feb 1999)83% of data integration projects overrun or fail - a poor understanding of the data is a significant reason50% of the time spent in data migration is spent trying to understand the source data

Page 9: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Automated Data Profiling

Column Analysis

Table Analysis

CrossTable

Analysis

Analyze Review AcceptReject

CreateData

Model

Normalize &Generate

Source/TargetMappings

GenerateETLJob

SampleData

FullData No

coding

Advice: You won’t have the time, $ or energy to profile 100%

quickly so go automated

Page 10: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #2 – Build-in Data Quality

Same company / person?

Same address?

Same parts?

Same instructions?

NAME ADDRESS

IBM 187 N. Pk. Str. Salem NH 01456

I.B.M. Inc. 187 N. Pk. St. Sarem NH 01456

International Bus. M. 187 No. Park St Salem NH 04156

Int. Bus. Machines 187 Park Ave Salem NH 01456

Inter-Nation Consult. 15 Main St. Andover MA 02341

Int. Bus. Consultants PO Box 9 Boston MA 02210

I.B. Manufacturing Park Blvd. Boston MA 04106

PART DESCRIPTION

WING ASSY DRILL 4 HOLE USE 5J868A HEXBOLT ¼ INCH

WING ASSEMBLY, USE 5J868-A HEX BOLT .25” – DRILL FOUR HOLES

USE 4 5J868A BOLTS (HEX .25) – DRILL HOLES FOR EA ON WING ASSEM

RUDER, TAP 6 HOLES, SECURE W/KL 2301 RIVETS (10 CM)

Spelling ErrorsLack of Standards in Synonyms, Acronyms,

Abbreviations

Error Codes?Assembly

Part SizeInstruction

Page 11: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Data Cleansing

Blk 1 |First St|05-00Blk 1 |First St|05-001 |First St|#05-00Blk 1 |First St|#05-001 |St |#05-00

Building | Street | Unit

Data Re-Engineering

Blk 1, 1 St, 05-0005-00 Frist St, Block 11 First Str, #05-00Block 1, First Str, #05-001, St, #05-00

Original

Standardize

Blk 1 |First St|05-00Blk 1 |First St|05-001 |First St|#05-00Blk 1 |First St|#05-001 |St |#05-00

Building | Street | Unit

Match Survive

#05-00, Blk 1, First St#05-00, 1, St

Final Result

Page 12: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #3 – Share Common Meta Data

CustomerCustomerNumberNameAddressComments

From Data Model

CustomerTblCustomerIDNameAddressAddress1Comments

From ETL Tool

CustomerDetailsCustomerNumberNameAddressRemarks

From BI ToolCustomerIDNameAddress1Address2Descr

From Database

The Identifier of customers that are tracked for ordering purposes. Corporate customer identifiers are assigned by the Sales Data Controller according to the corporate data description and naming policy for reference identifiers. Unique identifier of

customers that are tracked for ordering purposes. Values start with 02 for non-Corporate customers and 01 for Corporate customers.

<NULL>Customer’s identifier numbers. Values start with 01 for Corporate customers, 02 for non-Corporate customers, 03 for overseas-based Customers.

Which meta data is right?

Which one is current?

Which one should be used?

Page 13: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Integrated Meta Data

Integrated Meta Data Repository

Modeling tool BI tool

BI Repository

COBOL definition files

Other sources’definition files

ETL Tool + Processes

Integrate by gathering in from

diverse applications and sources

Page 14: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Integrated Meta Data

Integrated Meta Data Repository

Modeling tool BI tool

BI Repository

COBOL definition files

Other sources’definition files

ETL Tool + Processes

Web Browser

Integrate by publishing out to

diverse applications and targets

Page 15: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #4 – Connect to Any System, Anywhere

DB2, Informix, ODBC, Oracle,

Red Brick, SAS, Sybase,

Teradata, etc

WebSphere MQ, SeeBeyond, JMS, XML, EJB, Web Services, EXML, XMLS, EDI, SWIFT, etc

Oracle Applications, PeopleSoft, SAP R/3,

SAP BW, Siebel

Adabas, Allbase/SQL, Datacom/DB,

DB2/400, DB2/OS390,

Essbase, FOCUS,

IDMS/SQL, IMS, NonStopSQL,

RDB, VSAM, etc

Page 16: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Native Connectivity Software

Advice:

Go for pre-built connectors with little/no coding

Do you wish to worry what will be your next application or database to connect to?

Do you wish to worry what will be your next application or database to connect to?

Page 17: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #5 – Abandon Hand-coding

These Visual BASIC, Java, C++, UNIX codes can be developed cheaply and they work …

These Visual BASIC, Java, C++, UNIX codes can be developed cheaply and they work …

… but what happens when there is a new source or requirement?

Cheap? Works? Maybe not.

… but what happens when there is a new source or requirement?

Cheap? Works? Maybe not.

Page 18: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Graphical ETL Tools

Benefits:

1. Jobs are easy to develop, understand, debug and maintain

2. Robust, fully-tested, best practices approach to data migration or extraction

Page 19: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Graphical ETL Tools

Benefits:

1. Complex transformations can be made very simple with mere point-and-click

Page 20: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #6 – Implement a Highly Scalable Foundation

Source: “Surviving the Perfect Storm in Data Management” DM Review, January 2001

Prediction:Your data

volume is not going to get

smaller

Prediction:Your data

volume is not going to get

smaller

Page 21: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #6 – Implement a Highly Scalable Foundation

2 considerations in handling growth:

You want these Not these

32

Number of Processors1 8 16 24 32 . . .

Processing Time(Hours)

18

16

24

. . .

Number of Processors1 8 16 24 32 . . .

Processing Throughput(Hundreds of Gigabytes)

1X

8X

16X

24X

32X

. . .or

Page 22: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #6 – Implement a Highly Scalable Foundation

Three Elements of a Scalable Infrastructure

Scalable Database Platform

Database vendors have offered a scalable parallel relational database for more than 5 years.

Scalable Hardware Platform

Hardware vendorshave offered scalableparallel computers for more than 5 years.

Scalable Data Integration Platform

Data integration vendors are starting to offer “scalable” “parallel” platforms

Page 23: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Parallelism

Make sure you get this Not this

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPUCPU

Shared MemoryShared MemorySMP System

CPU CPUCPU CPUCPUCPUCPU

Shared Disk

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPU CPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPUCPU

Shared Memory

Shared Disk

Shared MemorySMP System

CPU CPU CPUCPU

Page 24: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Parallelism

Application Execution: Sequential or Parallel

Sequential 4-Way Parallel 64-Way Parallel

Uniprocessor SMP System MPP, GRID, and Clustered Systems

Source Data

TRANSFORM ENRICH LOADData

Warehouse

One application assembly

Auto parallel-enabled and parallel-aware run-time execution

Serial

Scan

Join

SortTime toProcess

Parallel Parallel

Page 25: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #7 – Ensure Interoperability of Integration Infrastructures

The Goal

Connected, integrated, seamlessly

The Reality

Cobbled, piece-meal, manual-intensive

Page 26: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Integrated Tool Suites

E.g. WebSphere Data Integration Suite

ANY SOURCE ANY TARGET

Parallel Execution

Meta Data Management

DISCOVERDISCOVER

Discover data content and

structure, and quality

monitoring

ProfileStage

PREPAREPREPARE

Standardize, match, and

correct data

TRANSFORMTRANSFORM

Transform, enrich, and

deliver data

DataStageDataStage TX

UnderstandUnderstand ReconcileReconcile DeliverDeliverQualityStage

Service Oriented Architecture

On-Demand and Event Driven Services

CRMERPSCMBusiness

IntelligenceRDBMSEAI/ MessagingWeb servicesXML/EDIData Warehouse

CRMERPSCM

RDBMSLegacy

EAI/ Messaging

Web servicesXML/EDI

Data Warehouse

Page 27: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #8 – Architect for “Right-Time”In an InformationWeek 2003 survey of 467 business professionals about how often their IT systems provide business managers with timely updates of primary products or services:

3% no such process1% annually17% monthly13% weekly36% daily5% hourly8% every minute

In that same report:“Whereas 57% of sites surveyed a year ago said that real-time business information was a key company focus, 70% see it that way today.”

Page 28: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Right-Time

Business Event

OccursRecognition ResponseLatency Latency

Latency is defined as the elapsed time between when an eventoccurs and when an appropriate response or action is made

campaign initiated tuning

customer churns win-back

website click offer made

fraud committed prevention

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

AcceptableLatency

Event OccursEvent Occurs AwarenessAwareness AppropriateAppropriateResponseResponse

Page 29: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Right-Time

Business Event

OccursRecognitionLatency

1. Improving the ability to recognize business events

Recognition ResponseLatency

2. Improving the ability to respond to those events

Page 30: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #9 – Extend Quality and Transformation Capabilities throughout the Enterprise

1. Hand-coded rules in each project/tool are not re-usable to other projects/tools

2. High costs associated with building & maintaining data access, data quality and transformation rules in each project

Portals

EAI, BPM, EII

Web applications

Dashboards

Legacy Apps

Packaged Apps

Business Partner Data

Data Warehouses

Master Data

Stores

Page 31: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Data Integration Services

Java,Application

Servers

MessageQueues,

EAI

Web Services

Business Partner Data

get customer

Service-OrientedArchitecture

LegacyApps

Packaged Apps

DataWarehouses

Master Data

Stores

SOA Approach 1. Service-Oriented Architecture (SOA) approach packages data integration logic of SOA-friendly applications as services

2. Services can be invoked as Web Services, EJB, JMS by any third-party applications

Page 32: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Strategy #10 – Choose a Proven Deployment Methodology designed for Quick Success

Many available out there

How many and which are workable – who knows?

Be aware there are as much risks in deployment methodology as there in tools usage

Page 33: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Recommended Best Practices: Iterative Deployment Plan

Establish BusinessDrivers

Deploy Solution

Evaluate Results

Derive BusinessValue

Start

End

12 -

24 W

eeks

investigate

design

develop deploy

operate

planproto-type

unittest

systemtest

UAT

Prod-uctionaudit

regressiontest

maint-enance

etc.

iteration

monitor

manage

Page 34: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

IBM Software Group | WebSphere software

Summary

1. A number of large enterprises have successfully integrated theirenterprise systems resulting in business results that drove revenue and lowered costs

2. These enterprises accomplished this through a set of technologies collectively known as Enterprise Data Integration

3. There are 10 proven strategies for success in an enterprise dataintegration initiative; although no single path is THE panacea to all corporate data problems - multiple approaches must be employed

Page 35: The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to Accessing, Consolidating and Trusting Your Data. ... DataStage DataStage TX ... The Essential

®

IBM Software Group

© 2005 IBM Corporation

Thank You

For more information, visit us