Technical appraisal and change impact analysis - IDCC17 workshop

28
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] Simon Waddington (King’s College London) Technical appraisal and change impact analysis

Transcript of Technical appraisal and change impact analysis - IDCC17 workshop

Page 1: Technical appraisal and change impact analysis - IDCC17 workshop

GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]

Simon Waddington (King’s College London)

Technical appraisal and change impact analysis

Page 2: Technical appraisal and change impact analysis - IDCC17 workshop

Appraisal ◦ Aims to determine which data should be kept by an

organisation◦ Traditionally performed prior to transfer to an archive◦ Guided by policies based on defined criteria

Technical appraisal◦ Evaluation of the (on-going) feasibility of preserving the

digital objects◦ Answers the question “can we preserve?”

Technical appraisal

Page 3: Technical appraisal and change impact analysis - IDCC17 workshop

Simple digital objects ◦ E.g. files, software applications, operating systems◦ Include hardware specification

Complex digital objects ◦ Digital objects made by combining a number of simple

digital objects

Dependency◦ Relationships between components of a complex digital

object◦ Functional relationship

Complex digital objects

Page 4: Technical appraisal and change impact analysis - IDCC17 workshop

Examples of complex digital objects

Digital video artwork Science experiment object

Video codec Container

Media player

Operating system

Computer

Digital video

Document ViewerImage Viewer

Image File

Scripting Language

Database

Document File

Page 5: Technical appraisal and change impact analysis - IDCC17 workshop

Complex digital objects subject to changing external environment◦ Technical appraisal required on an ongoing basis to

support long term reuse Reusability implies complex digital objects

may need to be adapted◦ Potential adaptations termed recovery options◦ Significant properties – specify what features should be

maintained Main risk considered is availability

◦ Obsolescence◦ Hardware failure

Change and reuse

Page 6: Technical appraisal and change impact analysis - IDCC17 workshop

Is this the Flying Scotsman?◦ Cost of the restoration £4.5 million from 2006–2016

Authenticity

Page 7: Technical appraisal and change impact analysis - IDCC17 workshop

Digital video artwork◦ Comprises videos and their surrounding technical environment◦ Video codec, audio codec, subtitles, container, media player, operating

system, computer, display

Mary - digital art conservator◦ Supports acquisition decisions◦ Maintains artworks for exhibition◦ Has limited technical knowledge of video◦ Has no control over the technologies used by artists

Artworks are required for ongoing display◦ Adapt artwork to current technical environment◦ Maintain viewing experience rather than use of specific technologies◦ Potentially exist in multiple versions

Artworks may be maintained indefinitely

Media case study

Sow Farm by John Gerrard

Page 8: Technical appraisal and change impact analysis - IDCC17 workshop

Space science experiment◦ Raw data captured by instrument, stored in database◦ Scripts written by scientists to process raw data◦ Image files and documents generated by scripts

Steve – space science data manager◦ Responsible for maintaining data from multiple experiments◦ Little or no control on the technologies used by scientists◦ Large volumes of experiments to deal with

Examples◦ Earth observation, solar measurements, material science, cell biology◦ Often time-related and expensive/impossible to replicate

Reuse – continuing over long timeframes◦ Compare performance of different instruments◦ Compare processing techniques◦ Determine long term trends e.g. in solar activity◦ Deal with errors and anomalies

Science case study

Page 9: Technical appraisal and change impact analysis - IDCC17 workshop

What are the external risks to a complex digital object?

What are the proximity and impact of those risks and what are the recovery options?

Implementation of the chosen recovery option

Risk assessment process

Page 10: Technical appraisal and change impact analysis - IDCC17 workshop

Maintain inventory of artworks and components ◦ Video formats, players, operating systems etc.

Monitoring the external environment◦ Aka preservation watch◦ Monitors websites and external news sources ◦ Networks with fellow conservators

Technical analysis◦ Records technical specifications of components◦ Learns from practical experience of testing

Mary’s manual approach

Page 11: Technical appraisal and change impact analysis - IDCC17 workshop

External monitoring is time-consuming and unreliable◦ E.g. QuickTime formats

Hard to plan forward◦ Sudden unavailability of a component hard to predict rigorously◦ May imply a large amount of work if a technology is used in many

artworks

Compatibility of components◦ Based on human experience rather than a systematic model

Difficult in determining recovery options◦ Time-consuming analysis and testing of many options

Problems for Mary

Page 12: Technical appraisal and change impact analysis - IDCC17 workshop

Large variety of scripting languages and formats used by scientists◦ No control of the technologies used

Unable to warn scientists that their experiments may need to be updated to maintain reusability

Can’t support scientists who want to rerun a particular experiment◦ E.g. provide information on website

Unfamiliar with older technologies

Problems for Steve

Page 13: Technical appraisal and change impact analysis - IDCC17 workshop

Normalisation◦ Convert objects to one or more “long-lived” formats◦ Performed systematically on all objects at acquisition

Problems◦ Objects may discarded before they require any adaptation◦ Objects may already be sufficiently “future proof”◦ May imply major re-engineering, whereas only minor changes are

sufficient◦ Could increase risks if wrong choices are made

Freezing◦ E.g. virtualisation◦ Software licensing, security and compliance issues◦ May be impossible to source suitable hardware◦ May not be acceptable to users e.g. scientists

Normalisation and freezing

Page 14: Technical appraisal and change impact analysis - IDCC17 workshop

Automated tool to assist in appraisal Main features

◦ Automated harvesting of environmental data and trend analysis

◦ Pre-built domain models for digital video and space science experiments

◦ Collection-level risk, proximity and impact analysis◦ Component-level risk, proximity and impact analysis◦ Object-level analysis and determination of recovery options

Storage◦ Tool creates a registry of objects◦ Objects themselves are not stored in the tool

PERICLES Appraisal Tool

Page 15: Technical appraisal and change impact analysis - IDCC17 workshop

Applied in industries such as aviation Determine availability of hardware components

Reliability engineering approach Standardised

lifecycle model for a technology ◦ Units shipped

against time

Page 16: Technical appraisal and change impact analysis - IDCC17 workshop

Compute lifecycle curve from harvested data ◦ Software repositories e.g. commits and downloads◦ Search engines◦ Wikipedia◦ Usage tracking data◦ Social networks

Confidence measure◦ Correlate results across different data sources

Calibration ◦ Compare results with known dates e.g. operating systems

Validation ◦ Operating systems have known end of support dates◦ Predict start date from incomplete time series

Analysis of external environment

Page 17: Technical appraisal and change impact analysis - IDCC17 workshop

“Push forward” principle

2012 2014 2016 2018 2020 2022 2024

Video codec

Container

Media player

Operating system

Computer

Current obsolescence

Recovery option 1

Recovery option 2

Recovery option 3

Page 18: Technical appraisal and change impact analysis - IDCC17 workshop

Representation of the entities and dependencies◦ OWL ontology◦ Scope - decision about what to leave in and what to leave out

Layered model◦ Domain-independent ontology (Linked Resource Model) to

describe change◦ Domain-dependent ontology – describes e.g. video components

Inherits from existing domain ontologies (e.g. CIDOC-CRM)

Modular◦ Supports reuse in different applications◦ Ontology design patterns

Ecosystem model

Page 19: Technical appraisal and change impact analysis - IDCC17 workshop

Describes the compatibility between instances◦ E.g. media player X and video codec Y

Does not guarantee compatibility◦ Recoverability options require testing and validation◦ Enables alternatives to be excluded

Features◦ Supports full and partial compatibility◦ Instances added by hand – currently command line tool◦ Needs to be updated over time◦ Two prebuilt ontologies provided

Compatibility relations

Page 20: Technical appraisal and change impact analysis - IDCC17 workshop

Reflects the cost of transforming entities of the same type◦ E.g. change media player from Mplayer to Xine

Currently built by hand using command line tool

Needs to be adapted to specific context and updated over time

Transformation relations

Page 21: Technical appraisal and change impact analysis - IDCC17 workshop

Use ontology to populate a probabilistic graphical model◦ States are components in complex digital object

Exhaustive analysis very costly◦ Apply a variation of Pearl’s Belief Propagation Algorithm◦ Based on efficient message passing

Generate recovery options◦ Correspond to different temporal constraints

Bayesian networks

Page 22: Technical appraisal and change impact analysis - IDCC17 workshop

Architecture of tool Based on web

services Java – UI

framework Analysis

components in Python and R

Triple store◦ Fuseki or

PERICLES ERMR

Page 23: Technical appraisal and change impact analysis - IDCC17 workshop

The technical appraisal tool is not a repository or archive

Central point is the ERMR (Entity Registry Model Repository)

Objects (composed of files, software, hardware descriptions)◦ Retained across multiple storage systems◦ Those storage systems may or may not be repositories or

archives

Distributed storage

Page 24: Technical appraisal and change impact analysis - IDCC17 workshop

Model Impact Change Explorer (MICE)◦ Visualisation tool using D3 Javascript library◦ Enables users to evaluate how a potential change to a

resource will impact the overall ecosystem◦ Changes described via “deltas”◦ uses PERSiST, an intermediate component for

semantic interpretation of the DVA ontology

MICE Tool

Page 25: Technical appraisal and change impact analysis - IDCC17 workshop

MICE GUI

Page 26: Technical appraisal and change impact analysis - IDCC17 workshop

MICE-Appraisal Tool IntegrationWorkflow

Engine

PERSIsT API

retrieves dependencies

and impact

forwardsChange (LRM delta)

visualises impact

accepts / rejects change

Entity Registry Model Repository (ERMR)

saves change

Technical Appraisal Tool

recovery options

inserts new

Media / selects

recovery option

returns user’s decision

sends change (RDF triples)

retrieves dependencies

and costs writes recovery options

Page 27: Technical appraisal and change impact analysis - IDCC17 workshop

PERICLES Appraisal Tool◦ Due for release in March 2017◦ Release on Github

PERICLES MICE tool◦ Available on Github at https://github.com/pericles-project/MICE

Licences◦ Apache License Version 2.0, January 2004◦  http://www.apache.org/licenses/

Availability and licences

Page 28: Technical appraisal and change impact analysis - IDCC17 workshop

Demonstrates an automated decision support for technical appraisal

Data-driven approach to monitor environmental trends

Ecosystem model to capture technical information on dependencies

Integrated tools for presenting risk-impact analysis, impact visualisation and recoverability options

Conclusions