Kalido_The Next Gen of DI.pdf

download Kalido_The Next Gen of DI.pdf

of 6

Transcript of Kalido_The Next Gen of DI.pdf

  • 7/27/2019 Kalido_The Next Gen of DI.pdf

    1/6

    White Paper

    The Next Generation of Data Integration for

    Data Warehousing

  • 7/27/2019 Kalido_The Next Gen of DI.pdf

    2/6

    Kalido White Paper: The Next Generation of Data Integration for Data Warehousing

    2

    IntroductionData integration has long been the most fundamental challenge in delivering a data warehouse, or for that

    matter any data integration initiative. The industry has seen three generations of ETL technology from codegenerators, to proprietary engines, to Extract-Load-Transform architectures, each with progressively more

    reuse, performance and manageability. However, in spite of this progress, the cost of delivering the data

    integration tasks has remained the most time consuming and labor intensive deliverable in any data warehouse

    project. This has led to signicant delays in putting data warehouses into production often as long as 12 to 18

    months and hamstrung IT departments from being responsive to changes in requirements and the business

    environment which necessitate changing the data warehouse.

    The ETL Vendor Response

    In response to this reality, there is a new wave of messaging from leading ETL vendors proposing data

    virtualization as the answer to delivering agility in data warehousing. This proposed approach is to use data

    virtualization to validate what customers are about to build with their ETL code. With this approach, customers

    may see a relative acceleration in the time it takes to get rst get visibility of the data, but this does not really

    constitute sustained business value. Without appropriate processes in place, the data samples produced from

    virtualization are no more accurate than system exports brought into a maze of uncontrolled spreadsheets.

    If one applies this same vir tualization solution for more complex, cross system integration initiatives, such

    as a data warehouse, then one might argue that the validation and integrity rules may be implemented in the

    virtualized solution. This may be the possible in some cases, but as soon as we go down this path, were back

    into the traditional set of time-consuming ETL job development activities.

    FIGURE 1: BY FAR THE MOST SIGNIFICANT EFFORT IN BUILDING A TR ADITIONALLY-DEVELOPED

    DATA WAREHOUSE I S SPENT ON DATA INTEGRATION

  • 7/27/2019 Kalido_The Next Gen of DI.pdf

    3/6

    3

    Kalido White Paper: The Next Generation of Data Integration for Data Warehousing

    Automation: The Missing PieceCompanies who have implemented data warehouses are well aware that there

    are a series of repeatable patterns related to the data integration tasks onemust undertake. Why cant these patterns simply be automated and these

    time-consuming, labor intensive tasks vastly reduced? The answer is simple.

    Traditional ETL tools have left the database model up to the modeling experts

    (and tools) to design and develop. This necessitates a dependency between two

    specialist disciplines in the IT space: the data modelers and the ETL developers.

    This dependency forces serialization and hand-offs between the two teams,

    which results in more documentation, not more automation.

    Another by-product of the traditional approach is that there is a lack of semantic

    knowledge of the data and how it relates to other data, which forces ETL

    development tasks to repeat the same manual tasks over and over again.

    For example, if your model species your sales invoices reference product,

    time, customer account and sales representative entities, the system could

    automatically infer that lookup to these tables is required to be performed. But

    without this semantic knowledge, the ETL tool has no way to automate these

    tasks. At Kalido, we believe there is a better way, and our 300+ customer

    deployments have proven it.

    Data integratiotasks involve

    series of repeatabl

    patterns. Why can

    these pattern

    simply be automate

    and these tim

    consuming, labo

    intensive tasks vast

    reduced

    Kalido: Reducing Data Integration for Data WarehousingThe Kalido Information Engine is driven from a business information model which clearly captures the model

    of the information the business requires to meet their analytical needs. This same business model is used to

    drive all the most common tasks required to build and sustain an operational data integration job. The datawarehouse developers map the inbound data to the business model. The Kalido Information Engine automates

    the detailed activities required to stage, validate and integrate this data.

    FIGURE 2: A SIMPLE EXAMPLE OF A KALIDO BUSINESS INFORMATION MODEL

  • 7/27/2019 Kalido_The Next Gen of DI.pdf

    4/6

    Kalido White Paper: The Next Generation of Data Integration for Data Warehousing

    4

    Lets explore an example data integration job from a popular ETL vendor and compare this to a load

    denition in Kalido to achieve the same objective.

    If we examine the above data warehouse ETL job, we see a series of common tasks. On the

    extreme left and right hand side of this job, the input source and the output target (in this case, a

    physical table) of the job are dened. Each of the links and stages in between these two icons

    represents the areas where ETL developers are required to specify detailed transformations,

    lookups, table mappings, error conditions, and stage the rejected records and aggregate the error

    messages. While there can be no doubt that this approach enables very complex and powerful data

    integration jobs to be dened, the process is time-consuming and error prone, which results not only

    in signicant effort in development, but equally large investment in testing and debugging issues

    which inevitably arise in this process.

    FIGURE 3: AN IBM INFOSPHERE DATASTAGE ETL JOB, TYPICAL O F AN ETL ROUTINE USED IN DATA WAREHOUSING

    (SOURCE: IBM INFOSPHERE DATASTAGE PERFORMANCE AND SCAL ABILITY BENCHM ARK WHITEPAPER, FEBRUARY 2010)

    A traditional ETL process is time-consuming and error

    prone, which results not only in signicant effort in

    development, but equally large investment in testing and

    debugging issues which inevitably arise in this process.

  • 7/27/2019 Kalido_The Next Gen of DI.pdf

    5/6

    5

    Kalido White Paper: The Next Generation of Data Integration for Data Warehousing

    By contrast in Kalido we too require that the source le or table is selected. Similarly we map this incoming data

    denition to the target. But in Kalido, the target is not a physical table but one or more target entities we dened

    in the Business Model in Figure 2. Kalido both creates the physical tables to house this data (inside the chosen

    database platform) and does the physical integration of the data into these tables. All the tasks specied in the

    ETL mapping job above are automated by the Kalido Information Engine, since we know all the relationshipsand rules that have been expressed in the Business Model.

    Another important difference with Kalido is that because the target of the data integration activity is a logical

    entity and not a physical table, any changes to the underlying physical table structures are managed through

    Kalido. When any of the tables involved in this job are changed, Kalido can automatically reect the impact of

    those changes. In fact, in many cases, Kalido will automatically stage and recast the data into the new physical

    tables when this occurs.

    Does Kalido Displace ETL in the Enterprise?While you may nd ETL tools useful for other integration projects, they are not necessarily required for a

    data warehouse project. ETL technology has earned its place at an infrastructure level in many enterprises

    today. Operational data integration and data sharing infrastructure has applicability in ERP integration and

    establishing an operational data bus. The Kalido Information Engine is dedicated only to the integration of data

    to deliver a data warehouse or data mart.

    Further, Kalidos focus is not on automating every possible transformation or function required to deliver a data

    warehouse, but rather on automating all the most common functions required in every data integration job.

    Kalido provides rich facilities to support the automated transformations directly in database, thus leveraging the

    power of the DBMS to enhance scalability, performance and resilience.

    FIGURE 4: KALIDO AUTOMATES ALL THE TASKS IN THE ETL MAPPING JOB BECAUSE IT KNOWS ALL THE RU LES AND

    RELATIONSHIPS THAT ARE EXPRESSED IN THE KALIDO BUSINESS MODEL

  • 7/27/2019 Kalido_The Next Gen of DI.pdf

    6/6

    Kalido White Paper: The Next Generation of Data Integration for Data Warehousing

    6

    ConclusionETL tools have their place in many integration projects, however their disadvantages

    outweigh their ability to assist customers to deliver agile data warehouses that can

    enable business value in a timely way. Too many data warehouse-related integration

    tasks are not repeatable with ETL tools because of the way they were designed to

    operate. Kalido overcomes these issues by automating the data integration tasks

    which otherwise would be manually constructed and maintained in ETL tools.

    As a result, Kalido customers realize faster time to value, easier maintenance, and

    greater agility not only in responding to initial warehouse demands, but also in

    keeping those warehouses up to date as the business and its analytical requirements

    evolve.

    About the Kalido Information EngineThe Kalido Information Engine helps customers rapidly deploy a foundation for

    analytics much faster than traditional hand-coding or ETL-based methods. Its

    ability to instantly adapt and change delivers more accurate, consistent and reliable

    information faster to your business, which can lead to better analytical performance

    and better decision making sooner, affording signicant top-line growth and bottom-

    line savings opportunities for your organization.

    About Kalido

    Kalido is the leading provider of business-driven information management software.Kalido enables companies to manage data as a shared enterprise asset by supporting

    the business process of data management. Kalido software has been deployed at

    more than 300 locations in over 100 countries, including 20 percent of the worlds most

    protable companies as determined by Fortune Magazine. More information about

    Kalido can be found at:http://www.kalido.com.

    Contact Information

    US Tel: +1 781 202 3200Eur Tel: +44 (0)845 224 1236Email: [email protected]

    or visit our website at www.kalido.com

    Kalido customers

    realize faster time to

    value, easier

    maintenance, and

    greater agility not only

    in responding to initial

    warehouse demands,

    but also in keeping

    those warehouses up

    to date as the

    business and its

    analytical

    requirements evolve.

    Copyright 2011 Kalido. All rights reserved. Kalido, the Kalido logo and Kalidos product names are trademarks of Kalido.

    References to other companies and their products use trademarks owned by the respective companies and are for reference purpose only. WP-NGDI10114

    http://www.kalido.com/http://www.kalido.com/mailto:info%40kalido.com?subject=http://www.kalido.com/http://www.kalido.com/mailto:info%40kalido.com?subject=http://www.kalido.com/