The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to...
Transcript of The Essential Guide to Accessing, Consolidating and ... · PDF fileThe Essential Guide to...
®
IBM Software Group
© 2005 IBM Corporation
The Essential Guide toAccessing, Consolidating and TrustingYour Data
IBM Software Group | WebSphere software
Quiz Time
What do all these companies have in common?
US$10 billion Retailer migrating and consolidating financial data into Oracle Financials
Reduced projected 2,700-day manual effort to 217 days
Saved US$2 million
US$45 billion Manufacturer consolidating more than 3,300 legacy software applications
down to 400 while reducing IT staff by 50%
US$4.5 billion global Chemicals Company consolidating 13 SAP instances into 1 global instance
Would save US$37 million in annual operating costs
IBM Software Group | WebSphere software
Quiz Time: Answers
Raw, disparate data and disconnected systems
Enterprise Data Integration
Business Results that drive revenue and lower costs
Happened despite pouring hundreds of millions of $ into
new ERP, CRM, SCB, BI, BPM and DW systems
IBM Software Group | WebSphere software
Other Companies
IBM Software Group | WebSphere software
Challenges in Data Management
Touching data multiple times at its source –storing multiple times and updating multiple times
Inability to share common business rules across projects, processes and applications
Inconsistent islands of information underlying applications
Complex, manual & costly copy synchronizationInconsistent and poor quality dataInability to exploit enterprise meta data across tools
Lack single, repeatable methodology for consistency across all projects
CRM Order Proc
SupplyChain
Procure-ment
IBM Software Group | WebSphere software
Remedy: 10 Proven Strategies
No single path is THE panacea to all corporate data problems - multiple approaches must
be employed
Consider where your organization’s most SIGNIFICANT data pain exists – take that
approach first
IBM Software Group | WebSphere software
Strategy #1 – Understand Source Systems
Business Analysis
Data Analysis
1. Discovers actual characteristics of data
2. Verify if characteristics of data conform to established / known business rules
3. Report on the assessment and variances / exceptions
IBM Software Group | WebSphere software
Strategy #1 – Understand Source Systems
Why this Strategy is #1 on a list of 10?K. Strange/T. Friedman – Gartner Group Research (2/28/2002)
“Complete understanding of data and awareness of data quality issues in operational systems is a critical success factor in any data integration or conversion effort.”
The Standish Group – Migrate Headaches (Feb 1999)83% of data integration projects overrun or fail - a poor understanding of the data is a significant reason50% of the time spent in data migration is spent trying to understand the source data
IBM Software Group | WebSphere software
Recommended Best Practices: Automated Data Profiling
Column Analysis
Table Analysis
CrossTable
Analysis
Analyze Review AcceptReject
CreateData
Model
Normalize &Generate
Source/TargetMappings
GenerateETLJob
SampleData
FullData No
coding
Advice: You won’t have the time, $ or energy to profile 100%
quickly so go automated
IBM Software Group | WebSphere software
Strategy #2 – Build-in Data Quality
Same company / person?
Same address?
Same parts?
Same instructions?
NAME ADDRESS
IBM 187 N. Pk. Str. Salem NH 01456
I.B.M. Inc. 187 N. Pk. St. Sarem NH 01456
International Bus. M. 187 No. Park St Salem NH 04156
Int. Bus. Machines 187 Park Ave Salem NH 01456
Inter-Nation Consult. 15 Main St. Andover MA 02341
Int. Bus. Consultants PO Box 9 Boston MA 02210
I.B. Manufacturing Park Blvd. Boston MA 04106
PART DESCRIPTION
WING ASSY DRILL 4 HOLE USE 5J868A HEXBOLT ¼ INCH
WING ASSEMBLY, USE 5J868-A HEX BOLT .25” – DRILL FOUR HOLES
USE 4 5J868A BOLTS (HEX .25) – DRILL HOLES FOR EA ON WING ASSEM
RUDER, TAP 6 HOLES, SECURE W/KL 2301 RIVETS (10 CM)
Spelling ErrorsLack of Standards in Synonyms, Acronyms,
Abbreviations
Error Codes?Assembly
Part SizeInstruction
IBM Software Group | WebSphere software
Recommended Best Practices: Data Cleansing
Blk 1 |First St|05-00Blk 1 |First St|05-001 |First St|#05-00Blk 1 |First St|#05-001 |St |#05-00
Building | Street | Unit
Data Re-Engineering
Blk 1, 1 St, 05-0005-00 Frist St, Block 11 First Str, #05-00Block 1, First Str, #05-001, St, #05-00
Original
Standardize
Blk 1 |First St|05-00Blk 1 |First St|05-001 |First St|#05-00Blk 1 |First St|#05-001 |St |#05-00
Building | Street | Unit
Match Survive
#05-00, Blk 1, First St#05-00, 1, St
Final Result
IBM Software Group | WebSphere software
Strategy #3 – Share Common Meta Data
CustomerCustomerNumberNameAddressComments
From Data Model
CustomerTblCustomerIDNameAddressAddress1Comments
From ETL Tool
CustomerDetailsCustomerNumberNameAddressRemarks
From BI ToolCustomerIDNameAddress1Address2Descr
From Database
The Identifier of customers that are tracked for ordering purposes. Corporate customer identifiers are assigned by the Sales Data Controller according to the corporate data description and naming policy for reference identifiers. Unique identifier of
customers that are tracked for ordering purposes. Values start with 02 for non-Corporate customers and 01 for Corporate customers.
<NULL>Customer’s identifier numbers. Values start with 01 for Corporate customers, 02 for non-Corporate customers, 03 for overseas-based Customers.
Which meta data is right?
Which one is current?
Which one should be used?
IBM Software Group | WebSphere software
Recommended Best Practices: Integrated Meta Data
Integrated Meta Data Repository
Modeling tool BI tool
BI Repository
COBOL definition files
Other sources’definition files
ETL Tool + Processes
Integrate by gathering in from
diverse applications and sources
IBM Software Group | WebSphere software
Recommended Best Practices: Integrated Meta Data
Integrated Meta Data Repository
Modeling tool BI tool
BI Repository
COBOL definition files
Other sources’definition files
ETL Tool + Processes
Web Browser
Integrate by publishing out to
diverse applications and targets
IBM Software Group | WebSphere software
Strategy #4 – Connect to Any System, Anywhere
DB2, Informix, ODBC, Oracle,
Red Brick, SAS, Sybase,
Teradata, etc
WebSphere MQ, SeeBeyond, JMS, XML, EJB, Web Services, EXML, XMLS, EDI, SWIFT, etc
Oracle Applications, PeopleSoft, SAP R/3,
SAP BW, Siebel
Adabas, Allbase/SQL, Datacom/DB,
DB2/400, DB2/OS390,
Essbase, FOCUS,
IDMS/SQL, IMS, NonStopSQL,
RDB, VSAM, etc
IBM Software Group | WebSphere software
Recommended Best Practices: Native Connectivity Software
Advice:
Go for pre-built connectors with little/no coding
Do you wish to worry what will be your next application or database to connect to?
Do you wish to worry what will be your next application or database to connect to?
IBM Software Group | WebSphere software
Strategy #5 – Abandon Hand-coding
These Visual BASIC, Java, C++, UNIX codes can be developed cheaply and they work …
These Visual BASIC, Java, C++, UNIX codes can be developed cheaply and they work …
… but what happens when there is a new source or requirement?
Cheap? Works? Maybe not.
… but what happens when there is a new source or requirement?
Cheap? Works? Maybe not.
IBM Software Group | WebSphere software
Recommended Best Practices: Graphical ETL Tools
Benefits:
1. Jobs are easy to develop, understand, debug and maintain
2. Robust, fully-tested, best practices approach to data migration or extraction
IBM Software Group | WebSphere software
Recommended Best Practices: Graphical ETL Tools
Benefits:
1. Complex transformations can be made very simple with mere point-and-click
IBM Software Group | WebSphere software
Strategy #6 – Implement a Highly Scalable Foundation
Source: “Surviving the Perfect Storm in Data Management” DM Review, January 2001
Prediction:Your data
volume is not going to get
smaller
Prediction:Your data
volume is not going to get
smaller
IBM Software Group | WebSphere software
Strategy #6 – Implement a Highly Scalable Foundation
2 considerations in handling growth:
You want these Not these
32
Number of Processors1 8 16 24 32 . . .
Processing Time(Hours)
18
16
24
. . .
Number of Processors1 8 16 24 32 . . .
Processing Throughput(Hundreds of Gigabytes)
1X
8X
16X
24X
32X
. . .or
IBM Software Group | WebSphere software
Strategy #6 – Implement a Highly Scalable Foundation
Three Elements of a Scalable Infrastructure
Scalable Database Platform
Database vendors have offered a scalable parallel relational database for more than 5 years.
Scalable Hardware Platform
Hardware vendorshave offered scalableparallel computers for more than 5 years.
Scalable Data Integration Platform
Data integration vendors are starting to offer “scalable” “parallel” platforms
IBM Software Group | WebSphere software
Recommended Best Practices: Parallelism
Make sure you get this Not this
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPUCPU
Shared MemoryShared MemorySMP System
CPU CPUCPU CPUCPUCPUCPU
Shared Disk
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPU CPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPUCPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPUCPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPUCPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPUCPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPUCPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPUCPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPUCPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPUCPU
Shared Memory
Shared Disk
Shared MemorySMP System
CPU CPU CPUCPU
IBM Software Group | WebSphere software
Recommended Best Practices: Parallelism
Application Execution: Sequential or Parallel
Sequential 4-Way Parallel 64-Way Parallel
Uniprocessor SMP System MPP, GRID, and Clustered Systems
Source Data
TRANSFORM ENRICH LOADData
Warehouse
One application assembly
Auto parallel-enabled and parallel-aware run-time execution
Serial
Scan
Join
SortTime toProcess
Parallel Parallel
IBM Software Group | WebSphere software
Strategy #7 – Ensure Interoperability of Integration Infrastructures
The Goal
Connected, integrated, seamlessly
The Reality
Cobbled, piece-meal, manual-intensive
IBM Software Group | WebSphere software
Recommended Best Practices: Integrated Tool Suites
E.g. WebSphere Data Integration Suite
ANY SOURCE ANY TARGET
Parallel Execution
Meta Data Management
DISCOVERDISCOVER
Discover data content and
structure, and quality
monitoring
ProfileStage
PREPAREPREPARE
Standardize, match, and
correct data
TRANSFORMTRANSFORM
Transform, enrich, and
deliver data
DataStageDataStage TX
UnderstandUnderstand ReconcileReconcile DeliverDeliverQualityStage
Service Oriented Architecture
On-Demand and Event Driven Services
CRMERPSCMBusiness
IntelligenceRDBMSEAI/ MessagingWeb servicesXML/EDIData Warehouse
CRMERPSCM
RDBMSLegacy
EAI/ Messaging
Web servicesXML/EDI
Data Warehouse
IBM Software Group | WebSphere software
Strategy #8 – Architect for “Right-Time”In an InformationWeek 2003 survey of 467 business professionals about how often their IT systems provide business managers with timely updates of primary products or services:
3% no such process1% annually17% monthly13% weekly36% daily5% hourly8% every minute
In that same report:“Whereas 57% of sites surveyed a year ago said that real-time business information was a key company focus, 70% see it that way today.”
IBM Software Group | WebSphere software
Recommended Best Practices: Right-Time
Business Event
OccursRecognition ResponseLatency Latency
Latency is defined as the elapsed time between when an eventoccurs and when an appropriate response or action is made
campaign initiated tuning
customer churns win-back
website click offer made
fraud committed prevention
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
AcceptableLatency
Event OccursEvent Occurs AwarenessAwareness AppropriateAppropriateResponseResponse
IBM Software Group | WebSphere software
Recommended Best Practices: Right-Time
Business Event
OccursRecognitionLatency
1. Improving the ability to recognize business events
Recognition ResponseLatency
2. Improving the ability to respond to those events
IBM Software Group | WebSphere software
Strategy #9 – Extend Quality and Transformation Capabilities throughout the Enterprise
1. Hand-coded rules in each project/tool are not re-usable to other projects/tools
2. High costs associated with building & maintaining data access, data quality and transformation rules in each project
Portals
EAI, BPM, EII
Web applications
Dashboards
Legacy Apps
Packaged Apps
Business Partner Data
Data Warehouses
Master Data
Stores
IBM Software Group | WebSphere software
Recommended Best Practices: Data Integration Services
Java,Application
Servers
MessageQueues,
EAI
Web Services
Business Partner Data
get customer
Service-OrientedArchitecture
LegacyApps
Packaged Apps
DataWarehouses
Master Data
Stores
SOA Approach 1. Service-Oriented Architecture (SOA) approach packages data integration logic of SOA-friendly applications as services
2. Services can be invoked as Web Services, EJB, JMS by any third-party applications
IBM Software Group | WebSphere software
Strategy #10 – Choose a Proven Deployment Methodology designed for Quick Success
Many available out there
How many and which are workable – who knows?
Be aware there are as much risks in deployment methodology as there in tools usage
IBM Software Group | WebSphere software
Recommended Best Practices: Iterative Deployment Plan
Establish BusinessDrivers
Deploy Solution
Evaluate Results
Derive BusinessValue
Start
End
12 -
24 W
eeks
investigate
design
develop deploy
operate
planproto-type
unittest
systemtest
UAT
Prod-uctionaudit
regressiontest
maint-enance
etc.
iteration
monitor
manage
IBM Software Group | WebSphere software
Summary
1. A number of large enterprises have successfully integrated theirenterprise systems resulting in business results that drove revenue and lowered costs
2. These enterprises accomplished this through a set of technologies collectively known as Enterprise Data Integration
3. There are 10 proven strategies for success in an enterprise dataintegration initiative; although no single path is THE panacea to all corporate data problems - multiple approaches must be employed
®
IBM Software Group
© 2005 IBM Corporation
Thank You
For more information, visit us