Kalido_The Next Gen of DI.pdf
-
Upload
umamadhav8841 -
Category
Documents
-
view
219 -
download
0
Transcript of Kalido_The Next Gen of DI.pdf
-
7/27/2019 Kalido_The Next Gen of DI.pdf
1/6
White Paper
The Next Generation of Data Integration for
Data Warehousing
-
7/27/2019 Kalido_The Next Gen of DI.pdf
2/6
Kalido White Paper: The Next Generation of Data Integration for Data Warehousing
2
IntroductionData integration has long been the most fundamental challenge in delivering a data warehouse, or for that
matter any data integration initiative. The industry has seen three generations of ETL technology from codegenerators, to proprietary engines, to Extract-Load-Transform architectures, each with progressively more
reuse, performance and manageability. However, in spite of this progress, the cost of delivering the data
integration tasks has remained the most time consuming and labor intensive deliverable in any data warehouse
project. This has led to signicant delays in putting data warehouses into production often as long as 12 to 18
months and hamstrung IT departments from being responsive to changes in requirements and the business
environment which necessitate changing the data warehouse.
The ETL Vendor Response
In response to this reality, there is a new wave of messaging from leading ETL vendors proposing data
virtualization as the answer to delivering agility in data warehousing. This proposed approach is to use data
virtualization to validate what customers are about to build with their ETL code. With this approach, customers
may see a relative acceleration in the time it takes to get rst get visibility of the data, but this does not really
constitute sustained business value. Without appropriate processes in place, the data samples produced from
virtualization are no more accurate than system exports brought into a maze of uncontrolled spreadsheets.
If one applies this same vir tualization solution for more complex, cross system integration initiatives, such
as a data warehouse, then one might argue that the validation and integrity rules may be implemented in the
virtualized solution. This may be the possible in some cases, but as soon as we go down this path, were back
into the traditional set of time-consuming ETL job development activities.
FIGURE 1: BY FAR THE MOST SIGNIFICANT EFFORT IN BUILDING A TR ADITIONALLY-DEVELOPED
DATA WAREHOUSE I S SPENT ON DATA INTEGRATION
-
7/27/2019 Kalido_The Next Gen of DI.pdf
3/6
3
Kalido White Paper: The Next Generation of Data Integration for Data Warehousing
Automation: The Missing PieceCompanies who have implemented data warehouses are well aware that there
are a series of repeatable patterns related to the data integration tasks onemust undertake. Why cant these patterns simply be automated and these
time-consuming, labor intensive tasks vastly reduced? The answer is simple.
Traditional ETL tools have left the database model up to the modeling experts
(and tools) to design and develop. This necessitates a dependency between two
specialist disciplines in the IT space: the data modelers and the ETL developers.
This dependency forces serialization and hand-offs between the two teams,
which results in more documentation, not more automation.
Another by-product of the traditional approach is that there is a lack of semantic
knowledge of the data and how it relates to other data, which forces ETL
development tasks to repeat the same manual tasks over and over again.
For example, if your model species your sales invoices reference product,
time, customer account and sales representative entities, the system could
automatically infer that lookup to these tables is required to be performed. But
without this semantic knowledge, the ETL tool has no way to automate these
tasks. At Kalido, we believe there is a better way, and our 300+ customer
deployments have proven it.
Data integratiotasks involve
series of repeatabl
patterns. Why can
these pattern
simply be automate
and these tim
consuming, labo
intensive tasks vast
reduced
Kalido: Reducing Data Integration for Data WarehousingThe Kalido Information Engine is driven from a business information model which clearly captures the model
of the information the business requires to meet their analytical needs. This same business model is used to
drive all the most common tasks required to build and sustain an operational data integration job. The datawarehouse developers map the inbound data to the business model. The Kalido Information Engine automates
the detailed activities required to stage, validate and integrate this data.
FIGURE 2: A SIMPLE EXAMPLE OF A KALIDO BUSINESS INFORMATION MODEL
-
7/27/2019 Kalido_The Next Gen of DI.pdf
4/6
Kalido White Paper: The Next Generation of Data Integration for Data Warehousing
4
Lets explore an example data integration job from a popular ETL vendor and compare this to a load
denition in Kalido to achieve the same objective.
If we examine the above data warehouse ETL job, we see a series of common tasks. On the
extreme left and right hand side of this job, the input source and the output target (in this case, a
physical table) of the job are dened. Each of the links and stages in between these two icons
represents the areas where ETL developers are required to specify detailed transformations,
lookups, table mappings, error conditions, and stage the rejected records and aggregate the error
messages. While there can be no doubt that this approach enables very complex and powerful data
integration jobs to be dened, the process is time-consuming and error prone, which results not only
in signicant effort in development, but equally large investment in testing and debugging issues
which inevitably arise in this process.
FIGURE 3: AN IBM INFOSPHERE DATASTAGE ETL JOB, TYPICAL O F AN ETL ROUTINE USED IN DATA WAREHOUSING
(SOURCE: IBM INFOSPHERE DATASTAGE PERFORMANCE AND SCAL ABILITY BENCHM ARK WHITEPAPER, FEBRUARY 2010)
A traditional ETL process is time-consuming and error
prone, which results not only in signicant effort in
development, but equally large investment in testing and
debugging issues which inevitably arise in this process.
-
7/27/2019 Kalido_The Next Gen of DI.pdf
5/6
5
Kalido White Paper: The Next Generation of Data Integration for Data Warehousing
By contrast in Kalido we too require that the source le or table is selected. Similarly we map this incoming data
denition to the target. But in Kalido, the target is not a physical table but one or more target entities we dened
in the Business Model in Figure 2. Kalido both creates the physical tables to house this data (inside the chosen
database platform) and does the physical integration of the data into these tables. All the tasks specied in the
ETL mapping job above are automated by the Kalido Information Engine, since we know all the relationshipsand rules that have been expressed in the Business Model.
Another important difference with Kalido is that because the target of the data integration activity is a logical
entity and not a physical table, any changes to the underlying physical table structures are managed through
Kalido. When any of the tables involved in this job are changed, Kalido can automatically reect the impact of
those changes. In fact, in many cases, Kalido will automatically stage and recast the data into the new physical
tables when this occurs.
Does Kalido Displace ETL in the Enterprise?While you may nd ETL tools useful for other integration projects, they are not necessarily required for a
data warehouse project. ETL technology has earned its place at an infrastructure level in many enterprises
today. Operational data integration and data sharing infrastructure has applicability in ERP integration and
establishing an operational data bus. The Kalido Information Engine is dedicated only to the integration of data
to deliver a data warehouse or data mart.
Further, Kalidos focus is not on automating every possible transformation or function required to deliver a data
warehouse, but rather on automating all the most common functions required in every data integration job.
Kalido provides rich facilities to support the automated transformations directly in database, thus leveraging the
power of the DBMS to enhance scalability, performance and resilience.
FIGURE 4: KALIDO AUTOMATES ALL THE TASKS IN THE ETL MAPPING JOB BECAUSE IT KNOWS ALL THE RU LES AND
RELATIONSHIPS THAT ARE EXPRESSED IN THE KALIDO BUSINESS MODEL
-
7/27/2019 Kalido_The Next Gen of DI.pdf
6/6
Kalido White Paper: The Next Generation of Data Integration for Data Warehousing
6
ConclusionETL tools have their place in many integration projects, however their disadvantages
outweigh their ability to assist customers to deliver agile data warehouses that can
enable business value in a timely way. Too many data warehouse-related integration
tasks are not repeatable with ETL tools because of the way they were designed to
operate. Kalido overcomes these issues by automating the data integration tasks
which otherwise would be manually constructed and maintained in ETL tools.
As a result, Kalido customers realize faster time to value, easier maintenance, and
greater agility not only in responding to initial warehouse demands, but also in
keeping those warehouses up to date as the business and its analytical requirements
evolve.
About the Kalido Information EngineThe Kalido Information Engine helps customers rapidly deploy a foundation for
analytics much faster than traditional hand-coding or ETL-based methods. Its
ability to instantly adapt and change delivers more accurate, consistent and reliable
information faster to your business, which can lead to better analytical performance
and better decision making sooner, affording signicant top-line growth and bottom-
line savings opportunities for your organization.
About Kalido
Kalido is the leading provider of business-driven information management software.Kalido enables companies to manage data as a shared enterprise asset by supporting
the business process of data management. Kalido software has been deployed at
more than 300 locations in over 100 countries, including 20 percent of the worlds most
protable companies as determined by Fortune Magazine. More information about
Kalido can be found at:http://www.kalido.com.
Contact Information
US Tel: +1 781 202 3200Eur Tel: +44 (0)845 224 1236Email: [email protected]
or visit our website at www.kalido.com
Kalido customers
realize faster time to
value, easier
maintenance, and
greater agility not only
in responding to initial
warehouse demands,
but also in keeping
those warehouses up
to date as the
business and its
analytical
requirements evolve.
Copyright 2011 Kalido. All rights reserved. Kalido, the Kalido logo and Kalidos product names are trademarks of Kalido.
References to other companies and their products use trademarks owned by the respective companies and are for reference purpose only. WP-NGDI10114
http://www.kalido.com/http://www.kalido.com/mailto:info%40kalido.com?subject=http://www.kalido.com/http://www.kalido.com/mailto:info%40kalido.com?subject=http://www.kalido.com/