A novel preservation watch system

28
SCAPE Luís Faria [email protected] Scout Design and architecture of a novel preservation watch system

description

Successful preservation of content requires sophisticated mechanisms for collecting, tracking and analyzing information about a multitude of relevant aspects. This is not limited to content itself, but also tracking of available software, other organization’s content, usage statistics and trends, format risks, systems operations and many more. Such tracking requires a flexible system that supports evolution over time and provides an extensible platform for scalability. This presentation describes the system design of a novel approach towards automated monitoring of preservation-related information. We discuss the challenges and information sources that need to be covered, and describe the architecture and data model of a novel preservation watch system, currently under development. We discuss how this system addresses critical information needs for informed preservation management and outline next steps ahead.

Transcript of A novel preservation watch system

Page 1: A novel preservation watch system

SCAPE

Luís [email protected]

ScoutDesign and architecture of a novel preservation watch system

Page 2: A novel preservation watch system

SCAPEOutline

• Preservation monitoring• Why and what is needed

• State of the art

• A novel approach• Methodology

• Architecture

• How can you participate?

2

Page 3: A novel preservation watch system

3

Preservation monitoring

Page 4: A novel preservation watch system

Repository

Format obsolescence

Emerging technology

Consumer trends

New standards

Organisation mission

Bit rot

Resource capability

System availability

Security breach

Economical limitations Social and political factors

Producer trends

Organisation policies

4

Why do we need monitoring?

Page 5: A novel preservation watch system

Repository

Format obsolescence

Emerging technology

Consumer trends

New standards

Organisation mission

Bit rot

Resource capability

System availability

Security breach

Economical limitations Social and political factors

Producer trends

Organisation policies

5

Why do we need monitoring?

RisksOpportunities

Page 6: A novel preservation watch system

SCAPE

6

State of the Art

• Digital Format Registries

• AONS

• Technology watch reports

Page 7: A novel preservation watch system

SCAPE

7

State of the Art

• Digital Format Registries• Lack of coverage

• Statically-defined generic risks

• Lack of structure in risks

• Focus on format obsolescence

• AONS• Total dependency on format registries

• Technology watch reports• Machine unreadable

Page 8: A novel preservation watch system

60%

40%

Yes but manual and adhocNone

Risk Assessment

Survey on:

8

Page 9: A novel preservation watch system

SCAPE

9

What is needed?

• We need data!• From anywhere and everywhere

• Sharing

• Usability & Scalability• Structured data

• Controlled vocabulary

Page 10: A novel preservation watch system

10

A novel approachScout

Page 11: A novel preservation watch system

?

11

Page 12: A novel preservation watch system

?Scout

Format

License

Mime type

Name

PRONOM ID

Tool

License

Renders

Name

Version

12

Page 13: A novel preservation watch system

?Scout

Format

License

Mime type

Name

PRONOM ID

Tool

License

Renders

Name

Version

12

Page 14: A novel preservation watch system

?Scout

Format

License

Mime type

Name

PRONOM ID

Tool

License

Renders

Name

Version

12

Page 15: A novel preservation watch system

PRONOM

13

?Scout

Format

License

Mime type

Name

PRONOM ID

Tool

License

Renders

Name

Version

Page 16: A novel preservation watch system

PRONOM

13

?Scout

Format

License

Mime type

Name

PRONOM ID

Tool

License

Renders

Name

Version

Page 17: A novel preservation watch system

PRONOM

14

?Scout

Format

License

Mime type

Name

PRONOM ID

Tool

License

Renders

Name

Version

Page 18: A novel preservation watch system

PRONOM

15

?Scout

Format

License

Mime type

Name

PRONOM ID

Tool

License

Renders

Name

Version

Page 19: A novel preservation watch system

SCAPE

16

Methodology• Survey practitioners on information to be monitored

• Create structured data model

• Find sources of information (try to automize)

• Define notification triggers

• Frequent monitoring of sources of information

• Frequent assessment of triggers and notification

Page 20: A novel preservation watch system

SCAPE

17

Example questions

• Are there any tools that can render the format X?

• Is my repository the only one that has format Y?

• Are my preservation plans still valid?

• Are my repository policies being enforced?

Page 21: A novel preservation watch system

SCAPE

18

Information Sources

• Format registries & software catalogues

• Digital repositories & web archives

• Organizational objectives

• Experiments

• Simulation

• Human knowledge

Page 22: A novel preservation watch system

19

C3PO Content profile tool

Characterization Reports:• Aggregation• Analysis• Representative

datasets

h"ps://github.com/peshkira/c3po

Petar  Petrov  <[email protected]>

Page 23: A novel preservation watch system

RODA

Rosetta

eSciDoc

Scout

RODA Report API

Rosetta Report API

eSciDoc Report API

Repository adaptor

OAI-PMH

20

Repository events adaptor

• OAI-PMH with PREMIS

• Normalize events

• Fine-grain events

• History

• Events example

• Ingest started/ended

• Representation downloaded

• Plan executed

Page 24: A novel preservation watch system

21

Define triggers

• Notify me when there are tools that can render the format X.

Page 25: A novel preservation watch system

22

Receive notifications

Email

HTTP Push API

There  are  tools  that  can  render  format  X.

Page 26: A novel preservation watch system

23

Interfaces

Web page

REST API

Page 27: A novel preservation watch system

SCAPEHow to be a part of Scout

• Join the surveys• Send your email to me <[email protected]>

• Integrate your content• Send your content profile with C3PO

• Send repository events with Report API (soon)

• Contribute with information (soon)• Use Scout form for manual input of knowledge

24