Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers
Unlocking value from data with data integration tools
-
Upload
phil-watt -
Category
Technology
-
view
668 -
download
4
description
Transcript of Unlocking value from data with data integration tools
![Page 1: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/1.jpg)
1
Unlocking value from data with Data Integration Tools
Phil Watt, Principal Integration Architect, HP Business Intelligence Solutions, EMEA
29/04/2010
![Page 2: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/2.jpg)
2
Outline Introduction Business drivers – why use a DI tool?
the challenge private sector public sector
Background and history DI tools timeline
Emerging features – and value Governance and Best Practice Selecting a tool for your situation Demonstration: Summary – followed by hands on session
29/04/2010
![Page 3: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/3.jpg)
3
About me
29/04/2010
19 years big data 10 years Data Integration tools
High volume Complex business rules Governance and metadata management
Clients include BSkyB BT Barclays/Barclaycard Centrica Experian John Lewis Partnership Microsoft A major UK political party
Strong focus on pragmatic delivery Best practices Design patterns Tool evaluation, selection and implementation
![Page 4: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/4.jpg)
4
Scope
29/04/2010
In scope• Data plumbing
• moving data around, and making it more useful to certain stakeholders
• Tools that help to• get data out of databases• get data into databases• transform data following some
business rules
Out of scope• Database technologies
• OLTP vs OLAP• Column versus row based storage• NoSQL movement (Hadoop,
Cassandra, etc.)• Information security
![Page 5: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/5.jpg)
5
Glossary
29/04/2010
Data Integration
Data Governance
Master Data Managemen
t (MDM)
Data Dictionary
Data Lineage
Data Discovery/Data Profiling
![Page 6: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/6.jpg)
6
The challenge
29/04/2010
Data growth• 60% annual global data growth through to 2012 (IDC research)• New sources of machine generated data will see this increase rapidly), e.g. Telemetry – new
Energy smart meters mean a x4000 growth in readings
Business drivers• Increased complexity of Business Requirements and Diverse sources, complex data• Consistent application of business terms across the enterprise• Time To Market (TTM) is a critical success factor• Reduce costs/improve productivity• Reduce power consumption
Collaboration• Onshore versus offshore delivery teams
Variable data quality• Data is often captured for one specific reason, then used or repurposed for different reasons
Cannot learn anything from data alone*• The model must inform the analysis• If the data does not support the model, then adjust the model
![Page 7: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/7.jpg)
7
Data warehouse example sizes
29/04/2010
Yaho
o*eB
ay
Face
book
Wal
-mar
tLH
C
Natio
nal I
D Car
ds*
0
2
4
6
8
10
12
Petabytes
![Page 8: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/8.jpg)
8
Public and academic examples
29/04/2010
Birmingham City Council http://www.experian.co.uk/www/pages/about_us/o
ur_clients/ http://www.qas.co.uk/company/press/new-experian
-software-helps-public-sector-to-enhance-single-citizen-view-projects-503.htm
University of Toulouse – academic medical research http://www.talend.com/open-source-provider/cases
tudy/CaseStudy_Academic_Medical_Research_EN.php
![Page 9: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/9.jpg)
9
Benefits of DI tools
29/04/2010
Productivity improves dramatically
Vendors often claim an order of magnitude improvement•that is, coding activities alone
50% improveme
nt is realistic when
considering other non-
coding activities
Improve understanding of the overall businessusing built in metadata management tools•build data dictionaries more easliy
•support and drive data governance
Built in scalability
Parallel processing – component, pipeline and data
![Page 10: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/10.jpg)
10
Extract, Transform and Load
29/04/2010
Extract Transform Load
e.g. CRM or ERP system Hub and spokeShared DW and ETL server
![Page 11: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/11.jpg)
11
Extract, Load and Transform
29/04/2010
Extract Load Transform
e.g. CRM or ERP system Shared DW and ETL server
![Page 12: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/12.jpg)
12
ETL versus ELT
29/04/2010
• Transformations often faster• No reliance on database
performance limitations• Typically scale better
ETL
• Avoids unloading large datasets for transformations and aggregations
• Best used with high performance analytical database systems such as:• Netezza, Neoview,
Oracle, Exadata Teradata, Greenplum, etc.ELT
![Page 13: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/13.jpg)
13
Multiple sources and targets
29/04/2010
![Page 14: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/14.jpg)
14
DI Tools Features Timeline1995 – 2005
29/04/2010
Parallelism
SCD
EAI/Message Queues
Connectors
Data Lineage
Config Mgmt
Business Metadata
CWM
Data Governance
MDM
1994 1996 1998 2000 2002 2004 2006
![Page 15: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/15.jpg)
15
DI Tools Features Timeline from 2006
29/04/2010
SOAP/WSDL
CDC
Screen Scrapers
Test management
CEP
Push Down Processing
Semantic Metadata
Rich Dashboards
Analyst Tools
Self Service DI
2006 2007 2008 2009 2010 2011
![Page 16: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/16.jpg)
16
Market features
29/04/2010
• Niche players acquired by established vendors• Watch out for product bloat
Industry consolidation
• Open Source versus pure commercial • Credit crunch• Established vendors often have complex pricing models
Price pressures / pricing complexity
• Increase productivity / Reduce time to market• Moving to self service for ‘purple people’
Focus on optimising workflow,
• Cool tech not enough for UK: must have strong business case
UK market very different to US
![Page 17: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/17.jpg)
17
Gartner Magic Quadrant
Taken from research document, ‘Magic Quadrant for Data Integration Tools’
Authors: Ted Friedman, Mark A. Beyer, Eric Thoo
Full report available by registering at www.talend.com
29/04/2010
Image removed for web publication as agreed with Gartner
![Page 18: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/18.jpg)
18
Magic Quadrant Disclaimer The Magic Quadrant is copyrighted November 25, 2009 by
Gartner, Inc. and is reused with permission. The Magic Quadrant is a graphical representation of a
marketplace at and for a specific time period. It depicts Gartner's analysis of how certain vendors measure
against criteria for that marketplace, as defined by Gartner. Gartner does not endorse any vendor, product or service
depicted in the Magic Quadrant, and does not advise technology users to select only those vendors placed in the "Leaders" quadrant.
The Magic Quadrant is intended solely as a research tool, and is not meant to be a specific guide to action.
Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
29/04/2010
![Page 19: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/19.jpg)
19
Best practices
29/04/2010
![Page 20: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/20.jpg)
20
Worst Practices
29/04/2010
![Page 21: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/21.jpg)
21
Gartner advice
29/04/2010
Allocate minimum 20% to data
source analysis
Allocate 20 - 30% to mapping and
transformation rules
Avoid custom-coding or desktop
tools
Increase business user
involvement to improve success
Best Practices Mitigate Data Migration Risks andChallenges – May 2009
![Page 22: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/22.jpg)
22
Governance and the data integration lifecycle
29/04/2010
![Page 23: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/23.jpg)
23
Best practices
29/04/2010
Do: Spend 50% of project time doing discovery,
analysis, design Get business users involved early and often Use tools to accelerate and compress timescales Pay attention to governance and metadata
So you can: De-risk the project Reduce overall cost and timescales Achieve best possible quality
![Page 24: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/24.jpg)
24
Selecting a tool for your situation
29/04/2010
2 stage process
Paper based
shortlist
On site Proof Of Concept (POC)
Understand the vendor
roadmapMatch to
your requiremen
ts
try to anticipate your needs
over the next 3-5
years
Do it yourself
or outsourc
e?
Is there an SI
ecosystem for the
vendors product?
Get help to choose
and upskill
Find a partner that
fits your culture and
has the right skills
![Page 25: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/25.jpg)
25
Qualification matrix (PW )
29/04/2010
![Page 26: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/26.jpg)
26
Demonstration
29/04/2010
![Page 27: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/27.jpg)
27 29/04/2010
![Page 28: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/28.jpg)
28 29/04/2010
![Page 29: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/29.jpg)
29 29/04/2010
![Page 30: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/30.jpg)
30 29/04/2010
![Page 31: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/31.jpg)
31 29/04/2010
![Page 32: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/32.jpg)
32 29/04/2010
![Page 33: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/33.jpg)
33
Demo metrics
29/04/2010
Performance Hardware – dual core 2.0Ghz Intel Centrino, 2.5Gb
Ram Environment – WinXP, Oracle Express (DB) +DI tool
(Expressor 2.0) 3 data sources
Customers 155 MB 1000K records Today’s orders 112 MB 100K records Yesterday's orders 0.3 MB 3K
records Total data volume 267 MB 1.1M
records Execution time 72 seconds Throughput 3.7 MB/sec 41k/sec
![Page 34: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/34.jpg)
34
Demo features
29/04/2010
Developer Productivity Graphical development Semantic Rationalisation and Re-usable Business
Rules
Demo represents a generic business scenario XML, message queues (MSMQ) , database
inputs/outputs, joins, aggregations and referential integrity management
Similar features to the ATG/Integrated Basket challenges?
![Page 35: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/35.jpg)
35
Summary
29/04/2010
Business drivers – why use a DI tool? the challenge
private sector public sector
Background and history DI tools timeline
Emerging features – and value Governance and Best Practice Selecting a tool for your situation Demonstration:
![Page 36: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/36.jpg)
36
Questions
29/04/2010
![Page 37: Unlocking value from data with data integration tools](https://reader035.fdocuments.in/reader035/viewer/2022062419/55757b02d8b42adb7e8b4bc0/html5/thumbnails/37.jpg)
37
References
29/04/2010
Curt Monash http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/
Wired: http://www.wired.com/wired/archive/12.04/grid.html
Zdnet: http://blogs.zdnet.com/storage/?p=213 Professor Chris Bishop:
http://conferences.theiet.org/lectures/turing/ Gartner http://www.gartner.com LHC data (2007):
http://www-conf.slac.stanford.edu/xldb07/xldb_lhc.pdf