Lower your risk with application data migration your risk with application data migration next steps...

14
Lower your risk with application data migration next steps with Informatica A White Paper by Bloor Research Author : Philip Howard Publish date : April 2013 White Paper

Transcript of Lower your risk with application data migration your risk with application data migration next steps...

Lower your risk with application data migrationnext steps with Informatica

A White Paper by Bloor ResearchAuthor : Philip HowardPublish date : April 2013

Whi

te P

aper

If we add in Data Validation and Proactive Monitoring then Informatica has a breadth of portfolio for data migration that is, as far as we know, unmatched in the industry. Philip Howard

1 © 2013 Bloor ResearchA Bloor White Paper

Lower your risk with application data migration next steps with Informatica

Introduction

In 2007 Bloor Research conducted its first survey into the data migra-tion market. We subsequently surveyed the market again in 2011. As we shall discuss, it is clear that lessons were learned over that time frame. However, it remains the case that roughly one third of migration projects run over time and/or budget. Moreover, these risks are not insubstan-tial and they are not just directly related to the costs of the project itself. Figure 1 illustrates the business impact of those costs where projects where delayed, based on the responses to our 2011 survey.

Figure 1: Costs to the business of overrunning projects

As far as figures go, the average cost of a project was $875,000 and if it formed part of a larger development project, which it often did (69%), then the average size of the overall budget was $2.8m. However, it should be noted that, for large organisations, it is not uncommon for projects to run into tens or hundreds of millions of dollars. Where pro-jects overran the average overrun cost was $268,000 (that’s more than 30%) for the migration portion of the project alone, excluding indirect costs such as reduced employee productivity, reduced revenue recogni-tion and increased customer dissatisfaction or attrition. Note that these averages were taken over a variety of data migration projects: we would expect large SAP or Oracle migrations to be significantly bigger than this, frequently exceeding $10m in terms of overall budget, suggesting a migration budget in excess of $3m.

We are not alone in estimating these sorts of figures. In September 2011 The Harvard Business Review published an article called “Why your IT project may be riskier than you think”. The authors analysed 1,471 projects, comparing their budgets and estimated performance benefits with the actual costs and results. The authors state that, “when we broke down the projects’ cost overruns, what we found surprised us. The average overrun was 27% - but that figure masks a far more alarm-ing one. Graphing the projects’ budget overruns reveals a “fat tail” - a large number of gigantic overages. Fully one in six of the projects we studied had a cost overrun of 200%, on average, and a schedule overrun of almost 70%. This highlights the true pitfall of IT change initiatives: it’s not that they’re particularly prone to high cost overruns on average it’s that an unusually large proportion of them incur massive overages.”

While these figures may seem appalling they are considerably better than they were in 2007 when only 16% of data migration projects were brought in on time and on budget. In other words, projects of this sort are actually much less risky than they used to be. We like to think that

2© 2013 Bloor Research A Bloor White Paper

Lower your risk with application data migration next steps with Informatica

Introduction

this improvement is due, at least in part, to the adoption of some of our recommendations in that original report. Nevertheless, we believe that further improvements in Lower your risk with migration projects can be made and we made a number of further suggestions in our 2011 report. However, technology moves on and we now believe that there are ad-ditional considerations to be borne in mind.

In this paper we will review what Bloor Research has learned over the past six years, since we first published our original Data Migra-tion Survey in 2007. While it was clear in our 2011 survey that matters had improved since 2007, it is now to be hoped that the best practices advocated in our most recent report have been similarly adopted and so we will consider what steps might be undertaken next to improve the results of data migration exercises. In particular, new tools have emerged in the last couple of years that improve migration prospects, which we will discuss.

3 © 2013 Bloor ResearchA Bloor White Paper

Lower your risk with application data migration next steps with Informatica

What we have learned

It appears clear that the adoption of data profiling and data quality tools were fundamental to increased success rates for data migration. There were two big differences between our 2007 and 2011 results. The first was that projects brought in on time and on budget rose from 16% of projects to 62% and the second was the rate of adoption of tools (and tried and tested methodologies) as opposed to hand coded efforts at migration. These latter results are compared in Table 1 (note that we did not ask about the use of data integration in 2007).

Used in 2007 Used in 2011

Data profiling tool 10% 72%

Data cleansing tool 11% 75%

Formal methodology 72% 94%

In-house methodology 76% 41%

Projects brought in on time and on budget 16% 62%

As we stated in our 2011 report, “we recommended the use of data profiling tools in our 2007 survey report and we suggested that these be deployed prior to setting a budget and timeline. In 2007 only 10% of respondent’s projects involved the use of data profiling tools. In 2011 that figure is 72%. While we cannot prove a causal link between this increase (and others) and the dramatically increased success rate of data migration projects, these figures are highly suggestive.”

We went on to say that “roughly half of the companies that did use profiling did so prior to setting budgets. 72% of these brought in their projects on time and on budget whereas just 52% did so when only using data profiling later. Lack of visibility into data quality issues (which data profiling provides) was cited by respondents as a major cause of failed projects.”

We continued in that report by recommending the following best prac-tices, based on what had been successful among respondents to our survey:

1. Use a data profiling tool both before and during the project

2. Use a data cleansing tool—do not attempt to do this manually

3. Use a data integration tool—do not hand code

4. Adopt a formal methodology that has been tried and tested—most often this will be one provided by a systems integrator or vendor

5. If part of a wider project the migration portion should be treated as an independent sub-project in its own right, ranging from budgeting through to testing

6. Companies should develop at least some internal competency with respect to data migration and not wholly rely on outside resources

7. The business MUST be engaged throughout all stages of a project, from initial scoping to (agile) testing.

It is worth emphasising this last point. We already know that there are serious business implications for those embarking on data migration projects but this doesn’t just apply in terms of potential costs. Companies

» Data Profiling is used to identify (and monitor) poor quality data in relevant data sources and for the discovery of relationships between data elements both within and across data sources.

» Data Quality is used to cleanse, de-duplicate and enrich existing data. It may also be used to pre-vent the entry of erroneous data.

» Data Integration is used to move data between sources and tar-gets, transforming the format of the data during that process, where appropriate.

4© 2013 Bloor Research A Bloor White Paper

Lower your risk with application data migration next steps with Informatica

What we have learned

were asked in our survey about the top three factors affecting the suc-cess of their data migration projects. By far the most important factor was “business engagement” with 72% of organisations quoting this as a top three factor and over 50% stating this as the most important fac-tor—see Figure 2. Conversely, over one third of companies quoted “lack of support from business users” as a reason for project overruns.

Figure 2: Factors affecting success of data migration projects

Data migration projects are undertaken because they will support busi-ness objectives. There are costs to the business if it goes wrong or if the project is delayed, and the most important factor in ensuring the suc-cess of such projects is close collaboration between the business and IT. Whether this means that the project should be owned by the busi-ness—treated as a business project with support from IT—or whether it should be delegated to IT with the business overseeing the project is debatable, but what is clear is that it must involve close collaboration.

Apart from this business focus the survey highlights the importance of data profiling, data quality, data integration and formalised meth-odologies to the overall success of data migration projects However, there are a whole range of other technologies that can be usefully be deployed within a migration scenario. We did, in fact, enquire about two of these—data masking and archiving—in our 2011 report. However, we did not explore either of these areas in detail. We did discover that 65% of projects involved archiving and 60% of those projects used tools to automate that process. In the case of data subject to privacy concerns and therefore suitable for data masking we found this to be the case in 48% of projects but less than 20% used a tool for data masking and around 10% ignored the issue (therefore breaking the law) completely! Given that we favour the use of automation where possible (and in the case of data masking also for other reasons that we explore below) these utilisation figures compare poorly with the figures for data profil-ing, data quality and data integration (85%).

However, these are not the only additional tools that may be used to increase the probability of a successful migration, which we will discuss in turn.

5 © 2013 Bloor ResearchA Bloor White Paper

What we have learned

Lower your risk with application data migration next steps with Informatica

Business glossary

Business glossaries provide definitions and descriptions of an organisation’s terminology and the derivation of that terminology. In par-ticular, it includes the derivations of business entities from their IT underpinnings. This will help business people and IT to collaborate because a developer, as an example, will be able to see what constitutes a “customer”, say, in IT terms in terms of tables in a database: in other words, business glossaries help the business and IT to understand and work with each other’s terminology. As we have seen, this sort of collaboration is especially impor-tant in enabling successful migration projects.

Data archiving

Data archiving is the process of taking data that is no longer required on a regular basis and storing it, typically in highly compressed form, on lower cost media. Archiving is impor-tant within migration environments for several reasons:

1. When archiving data it is important to en-sure that you work at the level of business entities (customers, suppliers and so forth rather than at the table level) in order to en-sure that data remains referentially intact during the archiving process. Migration also requires that the data remain referentially intact so the same underlying technologies (data profiling and the business glossary) will be used for both purposes. Collabora-tion between the business and IT is enabled by working at the level of business entities.

2. Because it is often a convenient time to ad-dress the issue of archiving: if you are going to migrate all of the data that is applicable to your current application you might as well only move what is current and up-to-date into the new system.

3. Archiving is actually a somewhat simpler process than migration. None of the data that you archive will need to work with the new application that you are migrating to, so the data that you archive does not to be transformed to meet the requirements of the new system and you not need consist-ency with the new environment. Thus data archiving actually reduces the total work-load associated with the migration.

Data governance

Another area of some significance that does not involve any specific technology is data governance. We asked about this in our 2011 survey. Some companies had their migration directly tied to their data governance initiative, some had governance projects that weren’t associated with the migration, and others had no data governance in place. The last of these were 10% less likely to bring in on time and on budget projects than those who had some ex-perience of data governance even if that gov-ernance was not associated with the migration project. In fact, we recommend that if you have not previously opted for data governance then a data migration project is a good place to start because the work that you need to do—profile the data, cleanse it, possibly archive and mask it—are all exactly the same sorts of things that you would do within a governance initia-tive, plus other functions such as building up a business glossary.

Data masking

Securing your data for data privacy reasons is mandated in most jurisdictions so, for roughly half of all migrations (according to 2011 re-sults) data masking is not an option. The only question is whether you try to do this manually or by means of a tool.

If it were simply a question of de-identifying data in some way then doing so via hand coding would be a reasonable option. However, that is rarely the case. It is usually the fact that, firstly the data needs to remain valid (for example, that the format of a zip code remains a poten-tially real one), secondly that relationships are maintained (the zip code matches the state), and thirdly that masking is consistent (the same original zip code is always mapped to the same scrambled zip code) or, if consistency is not possible or practical, that you can create (automatically) cross-reference tables that support the required consistency. In addition to how you mask the data a further requirement will that you should be able to prove that you have actually masked what you are supposed to have masked, so you will require a full audit trail that can be inspected upon request.

While you could do all of this manually it will add to your workload and increases the chance of er-rors. We do not recommend such an approach.

6© 2013 Bloor Research A Bloor White Paper

Lower your risk with application data migration next steps with Informatica

What we have learned

Data validation

A more directly applicable testing capability is a data validation capabil-ity that allows you to compare target tables with what you expected to be in those tables. Note that data validation at the table level is distinct from or, at least more granular than, similar tools that only work at a database level. Features that one would expect would include the ability to compare tables across heterogeneous sources, auditing of all tests and results so that processes are not just repeatable but can be tracked, and the ability to automatically highlight potential issues and alert rel-evant parties. We would recommend that data validation be used as part of an agile migration process so that any errors or problems are detected as early as possible, on the basis that the earlier issues are discovered the easier and cheaper it is to fix them.

Repeatability

Repeatability is not, of course, a tool in its own right. However, it is a function of the sorts of tools that we have been discussing that is often overlooked when considering whether to use a tool-based approach or one predicated on manual coding. There are two aspects to this. The first is with respect to organisations performing multiple migrations. This is quite common (with an average exceeding 5 projects per annum) as the results from our 2011 survey indicate, as shown in Figure 3.

Figure 3: Numbers of organisations performing multiple data migrations

Needless to say there will be significant numbers of processes that will be repeated across multiple projects of this type. However, the second aspect of repeatability is within individual projects. This will especially be true where an agile approach is being taken to development, which will certainly include iterative and zero-downtime approaches to migra-tion and will also apply to big-bang methods. We would argue that is repeatability is almost as important as the automation that tools provide you with.

7 © 2013 Bloor ResearchA Bloor White Paper

What we have learned

Lower your risk with application data migration next steps with Informatica

Replication

In the context of application data migrations, replication is important in supporting so-called ‘zero downtime’ migrations. The idea is that at no point during the migration should the relevant application be unavailable: there is no ‘big bang’ cutover process but an iterative migration process with built-in failback pro-cesses. Replication provides the synchronisa-tion logic that enables failback mechanisms.

There are some (small) indications that adopt-ing a zero downtime approach may have ben-eficial effects: according to our 2011 survey, users employing this methodology had a suc-cess rate that was 10% better than companies deploying other methods.

While this is a more complex and costly op-tion than traditional migration methods, this is a useful option to have. The key to deciding whether a zero downtime approach is right for any particular migration is the extent of the business imperative in ensuring that the application is continuously available: is the application mission critical, how widely is it used and by whom, and at what times of the day and night, and what days of the week? Risk factors should also be taken into account. For example, if you adopt a big bang approach and something goes wrong, what will be the impact on the business and its reputation? If these are potentially at risk then a zero downtime methodology may be preferable even if it is not mandated from a technological perspective.

Test data management

Test data management (TDM) is about the provisioning of data for non-production en-vironments, especially for test purposes but also for development, training, quality assur-ance, the creation of demonstration or other dummy data, and so on. There are essentially three ways to derive such data: you can take a copy of your production database and, where appropriate, mask any sensitive data; or you can subset your data and mask it; or you can generate synthetic data, based on an under-standing of the underlying data model, which means that no masking is required.

The most common use of TDM is for database sub-setting. In the context of application data migration you can take a subset, build your process, mappings, and data quality rules and test against the subset. Then you can take a larger subset and repeat—providing continu-ous improvement, all without having to use the entire data set each time. This enables an it-erative approach to the data migration process that is simply not feasible compared to a hand coded environment.

8© 2013 Bloor Research A Bloor White Paper

Lower your risk with application data migration next steps with Informatica

Vendors

So far in this paper we have discussed a variety of technical require-ments that support application data migrations and which you may need, depending on your requirements. All of these discussions have been generic and generally speaking relevant tools are available from a variety of suppliers, although there are very few vendors that can of-fer most of the capabilities described and only one, as far as we know, that can provide them all. That vendor is Informatica and the capa-bility that we have not seen elsewhere is its Data Validation Option. So, it is worth discussing whether Informatica has any other facilities that might help in application data migration environments. In fact it does, a capability called Proactive Monitoring for PowerCenter. This is relatively new to the market and, briefly, the product monitors Power-Center in real-time for exceptions such as broken rules (for example, PowerCenter has been running for a period of time but no data has been loaded), service level breaches, failure to adhere to best prac-tices (for example, that comments are appended). The key point about Proactive Monitoring is that, like data validation, it identifies problems early and therefore supports the agile and iterative approaches that are typical of migration environments.

9 © 2013 Bloor ResearchA Bloor White Paper

Conclusion

Lower your risk with application data migration next steps with Informatica

We have learned a lot about successful application data migration pro-jects over the last several years. However that has largely been focused on the basics. Now that success rates have climbed from something less than 20% to a figure that is over 60% it is time to consider how to improve that figure still further. In this paper we have considered other technologies that figure into the migration mix. Not all of these will be applicable to every project but many of them will be and all of them should potentially be in your kitbag.

If we even leave aside the last two tools discussed in this paper there aren’t many companies that provide data integration, data profiling, data cleansing, data archival, data masking, data replication for zero downtime migrations and test data management: you can count them on the fingers of one hand and still have fingers left over. If we add in Data Validation and Proactive Monitoring then Informatica has a breadth of portfolio that is, as far as we know, unmatched in the industry.

Further Information

Further information about this subject is available from http://www.BloorResearch.com/update/2165

Philip Howard Research Director - Data Management

Philip started in the computer industry way back in 1973 and has variously worked as a systems analyst, programmer and salesperson, as well as in marketing and product management, for a variety of companies including GEC Marconi, GPT, Philips Data Systems, Raytheon and NCR.

After a quarter of a century of not being his own boss Philip set up his own company in 1992 and his first client was Bloor Research (then ButlerBloor), with Philip working for the company as an associate analyst. His relationship with Bloor Research has continued since that time and he is now Research Director focused on Data Management.

Data management refers to the management, movement, governance and storage of data and involves diverse technologies that include (but are not limited to) databases and data warehousing, data integration (including ETL, data migration and data federation), data quality, master data management, metadata management and log and event manage-ment. Philip also tracks spreadsheet management and complex event processing.

In addition to the numerous reports Philip has written on behalf of Bloor Research, Philip also contributes regularly to IT-Director.com and IT-Analysis.com and was previously editor of both “Application Development News” and “Operating System News” on behalf of Cambridge Market Intel-ligence (CMI). He has also contributed to various magazines and written a number of reports published by companies such as CMI and The Financial Times. Philip speaks regularly at conferences and other events throughout Europe and North America.

Away from work, Philip’s primary leisure activities are canal boats, skiing, playing Bridge (at which he is a Life Master), dining out and walking Benji the dog.

Bloor Research overview

Bloor Research is one of Europe’s leading IT research, analysis and consultancy organisa-tions. We explain how to bring greater Agility to corporate IT systems through the effective governance, management and leverage of Information. We have built a reputation for ‘telling the right story’ with independent, intelligent, well-articulated communications content and publications on all aspects of the ICT industry. We believe the objective of telling the right story is to:

• Describe the technology in context to its business value and the other systems and processes it interacts with.

• Understand how new and innovative tech-nologies fit in with existing ICT investments.

• Look at the whole market and explain all the solutions available and how they can be more effectively evaluated.

• Filter “noise” and make it easier to find the additional information or news that supports both investment and implementation.

• Ensure all our content is available through the most appropriate channel.

Founded in 1989, we have spent over two decades distributing research and analysis to IT user and vendor organisations throughout the world via online subscriptions, tailored research services, events and consultancy projects. We are committed to turning our knowledge into business value for you.

About the author

Copyright & disclaimer

This document is copyright © 2013 Bloor Research. No part of this publication may be reproduced by any method whatsoever without the prior consent of Bloor Research.

Due to the nature of this material, numerous hardware and software products have been mentioned by name. In the majority, if not all, of the cases, these product names are claimed as trademarks by the compa-nies that manufacture the products. It is not Bloor Research’s intent to claim these names or trademarks as our own. Likewise, company logos, graphics or screen shots have been reproduced with the consent of the owner and are subject to that owner’s copyright.

Whilst every care has been taken in the preparation of this document to ensure that the information is correct, the publishers cannot accept responsibility for any errors or omissions.

2nd Floor, 145–157 St John Street

LONDON, EC1V 4PY, United Kingdom

Tel: +44 (0)207 043 9750 Fax: +44 (0)207 043 9748

Web: www.BloorResearch.com email: [email protected]