Content Migrations: Getting from A to B

Post on 08-May-2015

501 views 0 download

description

Deck fro

Transcript of Content Migrations: Getting from A to B

Content Migrations:A Field Guide

• Author of Website Migration Handbook v2

• First large migration: World Bank (1,000+ subsites)

• Consults to large and medium organizations

• David guides complex website transformations.

Deane Barker

• Working in content management since 1996

• Founding partner in Blend Interactive• Board member of Content

Management Professionals

Planning vs. Technical

• The planning process encompasses the entire scope of your migration effort

• The technical process is just one very critical part of this process

Agenda

• David will discuss the larger planning process– Break

• Deane will follow with a discussion about the specific technical challenges– End at 4:00 p.m.– Deane and David will be available for

discussion until 5:00 p.m.

Ask Questions

Getting from A to B

It’s painful.

[The End]

Requirements for Transfer

• You know–…what is being moved–…how it has to change on the way over–…how it fits back together on the other

side

Agenda

• Original Content vs. Derived Content• Content Geography• The Four Tasks of Content Transfer• Automated vs. Manual Import• The Automated Import Process• QA Automation

Original Content vs. Derived Content

Some HTML has to be moved.

Some HTML will be generated by your new system as content is imported.

Index Pages vs. Content Pages

Many pages on your new site are not rendered via content, but via

development.

Before you begin transfer, make sure you know which pages are derived and you have made plans to generate those in the new system.

Content Geography

Content has different levels of “geography”

Some content is very specifically placed, while other content is automatically organized.

Home

Products

Product A

Product B

About

History

Press Release

Highly-geographical content is much harder to migrate.

You have to migrate both the content and the placement.

Pop Quiz:Why are blogs so easy to migrate?No geography.Lots of derived index pages.

Hierarchical content requires you to determine and transfer structure

Home

Products

Product A

Product B

About

History

Stub Mapping

Existing Home

Products

Product A

Product B

About

History

New

The Path to Stub Mapping

• “We need to codify the new website structure…”

• “…let’s just store this in the new CMS…”

• “…and let’s store the old URL, just for reference…”

• “…and…can we just use that old URL to transfer the content?”

The Four Tasks of Content Transfer

The Four Tasks

• Extract• Transform• Import• Normalize

• We can generalize about the first two– Extract and transform are platform-

agnostic

#1: Extract

• Get content out of the existing system

• Break content into its necessary components

• Store in a neutral format– XML, usually

Migrating out of a CMS is a lot easier than the alternative.

CMS enforces at least some consistency.

Are you going to extract from the repository level or the publication

level?

Repository vs. Publication Extraction

RepositoryHTML

Processing

You may need to make changes to your old site to make

extraction easier or more complete.

You do not have to wait for anything to do this.

You can start extraction on the very day you decide to migrate your website.

#2: Transform

• Modify extracted content• Fix legacy problems with the content• Adapt content to fit the new

architecture• Neutralize idiosyncrasies in the

content

Content Transformation

Common Transformations

Common Transformations

#3: Import

• Move post-transformed content from a neutral format into the new system

• This is different for every CMS• This capability should be part of the

evaluation process

#4: Normalize

• Fix problems that are only “fixable” once content is in its new home

• Ex:– Relationship reconstruction– URL resolution– Navigation reconstruction

Content relationships can introduce chicken-egg

problems.

How will URLs change on the new platform?

If you content is interlinked how are you going to keep all those links valid?

Embedded URLs

Embedded URL Resolution

• If you have embedded URLs, they are now broken.

• How do you “re-connect” these URLs to the correct content?

• Usually performed as some kind of batch job.– You rarely get 100% accuracy.– Prepare to catch the remainder in QA.

Always store the old URL for a migrated page of content.

How it Works

• Iterate over every piece of content…• …then iterate over every single

property looking for anything that might contain links…

• …then iterate over all those links looking for the new content holding that old link…

• …then correct the link.

Once migrated, use the old URL to do a lookup in your 404 handler.

If you can preserve binary file URLs, do so. Your new CMS will likely make

this easier.

Depending on volume, menu reconstruction might be a manual process.

Automated vs. Manual Import

What is the actual mechanism of movement?

Copy-and-paste?Automated?

When Copy-and-Paste Works

• When you don’t have a lot of content• When you have access to cheap

labor• When your content is highly

geographic• When you cannot automate

transformation• When you have enough resources for

sufficient QA

When Automated Migration Works

• When you have large volumes of content

• When your content is not highly-geographic

• When you have sufficient technology and/or development resources

You don’t have to use the same method for your entire project.

The Automated Migration Process

Automated Migration Tools

• Great answer to the Transfer phase• Less of an answer to everything else• They still have to be configured and

tested

The Promise:

You will be able to develop a script that will reduce your migration to a button-click.

The Promise:

You will run this script, need to do nothing else, then launch your new website.

The Value-Add

• A scripting environment• Tested tools for:– Extraction– Transformation– Import (maybe…)

• Professional services

$$$$

Automated Migration Process

• Develop automated migration script– Configure– Execute– Evaluate– (Repeat)

• Accept a cycle “as good as is reasonable”• Perform necessary manual editing• Re-do changes during content freeze• Launch

Automated migrations are highly iterative.

Configure-Execute-Evaluate

Automated Migration Cycle

Configure Execute Evaluat

eManual Editing

Iterate again…

Launch

Weeks? Months? Days? Minutes?

“As good as is reasonable…”

Once you accept the output of a migration cycle, you are in a content

freeze

Handling a Content Freeze

• Don’t change any content on the existing site

• Track changes so they can be re-changed on the new site

QA Automations

Ideally, track the QA process inside the CMS

itself.

• WEBhttp://gadgetopia.com

• TWITTER@gadgetopia

• EMAILdeane@blendinteractive.com