D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference...

51
www.forgetit-project.eu ForgetIT Concise Preservation by Combining Managed Forgetting and Contextualized Remembering Grant Agreement No. 600826 Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial Model Deliverable Leader F. Gallo (EURIX) Quality Assessor B. Logoglu (TT) Dissemination level PU Delivery date in Annex I M15 (April 2014) Actual delivery date 2015-01-30 Revisions 12 Status First Release Keywords: Preserve-or-Forget Reference Model, OAIS

Transcript of D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference...

Page 1: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

www.forgetit-project.eu

ForgetITConcise Preservation by Combining Managed Forgetting

and Contextualized Remembering

Grant Agreement No. 600826

Deliverable D8.2

Work-package WP8: The Preserve-or-Forget ReferenceModel and Framework

Deliverable D8.2: The Preserve-or-Forget ReferenceModel Initial Model

Deliverable Leader F. Gallo (EURIX)Quality Assessor B. Logoglu (TT)Dissemination level PUDelivery date in Annex I M15 (April 2014)Actual delivery date 2015-01-30Revisions 12Status First ReleaseKeywords: Preserve-or-Forget Reference Model, OAIS

Page 2: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

Disclaimer

This document contains material, which is under copyright of individual or several ForgetITconsortium parties, and no copying or distributing, in any form or by any means, is allowedwithout the prior written agreement of the owner of the property rights.

The commercial use of any information contained in this document may require a licensefrom the proprietor of that information.

Neither the ForgetIT consortium as a whole, nor individual parties of the ForgetIT consor-tium warrant that the information contained in this document is suitable for use, nor thatthe use of the information is free from risk, and accepts no liability for loss or damagesuffered by any person using this information.

This document reflects only the authors’ view. The European Community is not liable forany use that may be made of the information contained herein.

c© 2015 Participants in the ForgetIT Project

Page 2 (of 51) www.forgetit-project.eu

Page 3: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

Revision History

Version Major changes Authorsv0.01 Created ToC, assigned contributions EURIXv0.02 Added preliminary text for introduc-

tion, foundations of reference model,functional and information model

L3S, LTU, EURIX

v0.03 Completed foundations of referencemodel, added mapping to the archi-tecture

LUH, EURIX

v0.04 Created diagrams and added descrip-tion of functional entities

LUH, LTU

v0.05 Added mapping to OAIS, updatedother Sections

LTU, EURIX

v0.06 Completed description of model lay-ers and workflows

LUH, LTU, EURIX

v0.07 Added section about extensions of in-formation management systems

LUH

v0.08 First complete version for commentsfrom all partners

EURIX

v0.09 Implemented comments from all part-ners, version ready for internal QA

EURIX

v0.10 Implemented comments from internalQA

EURIX

v0.11 Implemented final comments from allpartners, first release version

EURIX

v0.12 Final checks and clean-up, first re-lease version for submission

LUH, EURIX

List of Authors

Partner Acronym Authors

LUH Claudia NiedereeLTU Ingemar Andersson, Jorgen NilssonIBM Doron ChenDFKI Heiko MausUSFD Mark GreenwoodUEDIN Robert LogieEURIX Francesco Gallo

c© ForgetIT Page 3 (of 51)

Page 4: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

Table of Contents

Table of Contents 4

Executive Summary 6

1 Introduction 7

1.1 Purpose of the PoF Reference Model . . . . . . . . . . . . . . . . . . . . . 7

1.2 Target Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Structure of the Deliverable . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Foundations of the PoF Reference Model 10

2.1 Integrative: Bridging the Gap between Information and Preservation Man-agement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Value-driven: Acting upon Short-term and Long-term Information Value . . 11

2.3 Brain-inspired: What can we Learn from Human Forgetting and Remem-bering? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Forgetful: Focus on the Important Things . . . . . . . . . . . . . . . . . . . 14

2.5 Evolution-aware: Embracing the Long-term Perspective . . . . . . . . . . . 15

3 Preserve-or-Forget Reference Model 17

3.1 Functional Model: Layers, Workflows, Functional Entities . . . . . . . . . . 17

3.1.1 Core Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.2 Remember & Forget Layer . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.3 Evolution Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Information Model: Core Elements . . . . . . . . . . . . . . . . . . . . . . . 28

4 Mapping to ForgetIT Architecture 30

4.1 PoF Framework Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 Relationship between PoF Reference Model and PoF Architecture . . . . . 30

4.3 Model Implementation in PoF Architecture . . . . . . . . . . . . . . . . . . . 32

Page 4 (of 51) www.forgetit-project.eu

Page 5: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

5 Information Management Systems Extensions 37

5.1 Extensions of the Active System . . . . . . . . . . . . . . . . . . . . . . . . 37

5.1.1 Supporting Information Exchanges . . . . . . . . . . . . . . . . . . . 37

5.1.2 Information Value Evidences . . . . . . . . . . . . . . . . . . . . . . 37

5.1.3 Supporting Contextualization . . . . . . . . . . . . . . . . . . . . . . 38

5.2 Creating Benefit in the Active System . . . . . . . . . . . . . . . . . . . . . 38

5.2.1 Forgetful Information Presentation . . . . . . . . . . . . . . . . . . . 39

5.2.2 Forgetful and Archive-aware Search . . . . . . . . . . . . . . . . . . 39

5.2.3 Creating Awareness for Content Value . . . . . . . . . . . . . . . . . 39

5.3 Preservation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6 Mapping to OAIS and Preservation Management 41

6.1 The OAIS Reference Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.2 Relationship with OAIS Reference Model . . . . . . . . . . . . . . . . . . . 42

6.3 Relationship with PAIMAS Reference Model . . . . . . . . . . . . . . . . . . 44

6.4 Evolution and Value creation in the Preservation Store . . . . . . . . . . . . 45

7 Summary and Future Work 47

7.1 Assessment of Performance Indicators . . . . . . . . . . . . . . . . . . . . . 48

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

References 50

c© ForgetIT Page 5 (of 51)

Page 6: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

Executive Summary

This document describes the initial version of the Preserve-or-Forget (PoF) ReferenceModel. The Reference Model aims to encapsulate the core ideas of the ForgetIT ap-proach into a a re-usable model. Therefore, it is inspired by the core principles of thePreserve-or-Forget approach: synergetic preservation, managed forgetting and contextu-alized remembering.

Based on this, we have identified five characteristics for a PoF Reference Model, whichconsiders Active System and Digital Preservation System (DPS) as a joint ecosystem:integrative, value-driven, brain-inspired, forgetful and evolution-aware. Those character-istics have driven the design of the functional part of the PoF Reference Model and theassociated Information Model. The Functional Model is made up of three layer: CoreLayer, Remember & Forget Layer and Evolution Layer. For each layer we describe themain functional entities and the representative workflows.

This deliverable is not limited to the PoF Reference Model. It also presents mappings andinteractions of the model with other building blocks of an intelligent DPS. Firstly, the PoFFramework developed in WP8 is compliant to the model. Actually its design was inspiredby the experiences gathered in the creation of the framework. For clarifying the relation-ship, this deliverable contains a mapping of the PoF Reference Model functional entitiesto the relevant architecture components. Furthermore, the relationship with OAIS is alsodiscussed: with its focus on synergetic preservation, the PoF Reference Model comple-ments and goes beyond the OAIS Reference Model by stressing the aspects bringingthe Active System and the DPS closer together. Finally, it also discusses the extensionsrequired in any Active System for following the approach outlined by the PoF ReferenceModel.

The model presented in this deliverable is based on the work done for the PoF Middle-ware in WP8, by the concepts developed in WP3-WP7 as well as by the foundationalwork in WP2 and WP5. Furthermore, the PoF Reference Model builds upon the lessonslearned so far in the ForgetIT project - including the experiences from the work with theinterdisciplinary partners and the application partners.

The PoF Reference Model serves as conceptual guideline for further integration process:the current version is still considered preliminary and will be further developed in theremaining time of the project. A final version of the PoF Reference Model will be reportedin deliverable D8.5.

Page 6 (of 51) www.forgetit-project.eu

Page 7: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

1 Introduction

In this deliverable we present a model for the novel approach to intelligent preservationmanagement taken by the ForgetIT project. Since ForgetIT stresses the smooth interac-tion between information management and preservation management, the Preserve-or-Forget (PoF) Reference Model described here pays special attention to the functionality,which bridges between the Active System and the Digital Preservation System (DPS)(see deliverable D8.1[ForgetIT(2014b)]). This includes the selection of content for preser-vation, the transfer of content between the systems, contextualization for easing long-terminterpretation, processing during preservation time in reaction to changes, as well as ac-cess to the joint information space populated both by preserved content and content inactive use.

The PoF Reference Model presented in this deliverable is based on the work done forthe PoF Middleware in WP8, the components and especially the concepts developed inWP3-WP7 as well as the foundational work in WP2 and WP5. Furthermore, the modelbuilds upon the lessons learned so far in the ForgetIT project - including the experiencesfrom the work with the interdisciplinary partners and with the application partners. ThePoF Reference Model aims to encapsulate the core ideas of the ForgetIT approach into aa re-usable model.

The PoF Reference Model serves as conceptual guideline for further integration processand will be further refined during the project. The refined and extended version of themodel will be presented in deliverable D8.5 [ForgetIT(2016)].

1.1 Purpose of the PoF Reference Model

It is the idea of the ForgetIT project to follow a forgetful, focused approach to digital preser-vation, which is inspired by human forgetting and remembering. Its goal is to ease theadoption of preservation technology especially in the personal and organizational contextand to ensure that important content is kept safe, useful, and understandable on the longrun. For this purpose, concepts, technologies, and an entire framework (the PoF Frame-work) have been developed in the ForgetIT project so far. Furthermore, the frameworkhas been implemented in a first version in the form of the PoF Middleware (see deliverableD8.3 [ForgetIT(2014c)]).

While the architecture and the implemented PoF Middleware ease adoption on the tech-nology level, the PoF Reference Model leverages the ideas developed so far in the For-getIT project to a conceptual level: required conceptual functionalities and relevant con-cepts are collected in a systematic way and are related to each other.

Similar to the OAIS Reference Model [CCSDS(2012)], the PoF Reference Model definesthe terminology and concepts for the approach, which can be used as a basis for theimplementation of the ForgetIT approach as well as for the further discussion and devel-

c© ForgetIT Page 7 (of 51)

Page 8: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

opment of a forgetful approach to preservation.

The PoF Reference Model is also intended as the basis for the implementation of theForgetIT approach in other ways than the one given by the PoF Middleware and in thecontext of other types of Active System and/or DPS, as they are considered for the PoFMiddleware implementation. For fostering this aspect we also stress the idea of consider-ing information management and preservation management as a joint ecosystem for thePoF Reference Model.

It is worth noting that the ForgetIT model does not intend to replace the OAIS model. Asmentioned above, it has a different focus than the OAIS model and can interact with OAIScompatible approaches as it is also outlined in Section 6 of this deliverable.

The PoF Reference Model especially targets personal preservation settings as well asorganizational preservation settings, which are not covered by legal regulations such asdeposit laws. The focus is not on supporting memory institutions, although parts of theForgetIT ideas (e.g., contextualization) might be applicable in this area as well. The se-lected settings (personal and non-mandatory organizational preservation) can especiallybenefit from the ForgetIT approach, since a) there is a big gap to preservation adoption inthose fields, b) there are explicit preservation choices to be made, and c) there is a needfor automation of the processes, in order to reduce the amount of investment required forpreservation.

For the representation of the preliminary model described here we provide simple graph-ical representations as starting point for deeper analysis. If necessary, a more formal ap-proach will be used for the final version of the model, using UML notation and machine-processable formats (e.g. ontology or XML schema specification), supporting the im-plementation of the PoF Framework. Such information will be reported in deliverableD8.5 [ForgetIT(2016)].

1.2 Target Audience

The target audience of this deliverable are practitioners and developers in the area of con-tent and information management, who consider to get started with content preservation.The deliverable is also targeting practitioners, developers, and researchers in the area ofpreservation management.

In order to serve this diverse audience, the deliverable considers the presented PoF Ref-erence Model from different perspectives. In addition to the presentation of the modelitself in terms of functional model and information model, the deliverable also presents amapping to the OAIS model and the interaction of the PoF Reference Model with a con-tent management system. Furthermore, the presented mapping to the PoF architectureis also meant to ease adoption for system developers.

Page 8 (of 51) www.forgetit-project.eu

Page 9: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

1.3 Structure of the Deliverable

The rest of this deliverable is structured as described in the following. In Section 2,we discuss the foundations of the PoF Reference Model, covering the characteristicsof the model as well as the requirements, which did influence its development. In Sec-tion 3, we present the PoF Reference Model for an integrated information and preser-vation management system, which considers this combination as a joint ecosystem. Inmore detail, we describe a functional model and an information model. In Section 4,we relate the PoF Reference Model to the reference architecture of the PoF Framework.This architecture, which is based on the system architecture presented in deliverablesD8.1 [ForgetIT(2014b)] and D8.3[ForgetIT(2014c)], maps the joint ecosystem of buildingblocks from the reference model on an architecture with three main parts: the Active Sys-tem, the Digital Preservation System (DPS) and the PoF Middleware. In the second part ofthe deliverable, we relate the PoF Reference Model to existing models and approaches,in order to ease adoption. This includes a discussion of the extensions required in aninformation system for becoming part of an information and preservation managementecosystem (Section 5) and a discussion on how to map the relevant parts of the model onan OAIS based DPS as well as suggested extensions to such a system (Section 6). Weconclude the deliverable with a summary of the main insights, with ideas for future workand an assessment of the results compared to success indicators reported in the projectproposal.

c© ForgetIT Page 9 (of 51)

Page 10: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

2 Foundations of the PoF Reference Model

Five main characteristics have been identified for the proposed PoF Reference Model,supporting the ForgetIT approach for a sustainable and smooth transition between infor-mation and preservation management. The model is:

1. integrative - bridging the gap between information and preservation managementfor easing the adoption of preservation technologies;

2. value-driven - acting upon short-term and long-term information value based oncareful multi-faceted information value assessment;

3. brain-inspired - learning from the way humans forget and remember for a bettermore focused management of digital memories;

4. forgetful - using the idea of forgetting in the digital memory for staying focused andsupporting preservation decisions;

5. evolution-aware - embracing the long-term perspective by dealing with change onvarious levels.

Those partially related characteristics and their facets as well as the requirements andideas, which inspired those characteristics, are discussed in more details below. Theyprovide the foundations for the PoF Reference Model discussed in Section 3.

2.1 Integrative: Bridging the Gap between Information and Preser-vation Management

It is the aim of the ForgetIT approach to preservation to create a smooth transition be-tween active information use and preservation of information, which so far are some quiteseparate worlds. For this purpose, the ForgetIT Reference Model should be integrative,bringing the Active System and the DPS closer together. Further details about founda-tions of synergetic preservation can be found in deliverable D5.1 [?] and following WP5deliverables.

However, due to the inherent long-term perspective of preservation-related solutions itis not the aim to build a strongly integrated, monolithic system. In the long run, it hasto be foreseen that the Active System used as well as the employed DPS will change.Therefore, the idea is a flexible integration, which enables smooth bi-directional transitionof information between the Active System and the DPS and, at the same time, is alsoprepared for major changes in the overall environment. [?]

A core part of integration is to enable the smooth transition of content to be preservedinto the DPS and the senseful reactivation of content back into the Active System aftera -possibly very long - period into the DPS. An integrative solution should also embrace

Page 10 (of 51) www.forgetit-project.eu

Page 11: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

the idea of a joint information space, where the information in the DPS stays conceptuallyaccessible, e.g., visible in search results, even if the content is only available in the DPS.

One part of achieving a smooth transition is to act as a pre-ingest system (or pre-access)and prepare information packages for delivery in either way. With highly automated pro-cedures for preparation of these packages according to agreements or requirements, thequality of the information packages becomes more consistent, which alleviate the burdenof information package handling on both sides. [ForgetIT(2014a)]

In addition, an integrative system should also support the decisions on what to preservefor easing the integration of preservation into the information management workflows,finally aiming for an integrated information and preservation management workflow. Theautomated selection process (appraisal) will be discussed in more detail in the context ofmanaged forgetting (see Section 3.1).

2.2 Value-driven: Acting upon Short-term and Long-term Informa-tion Value

One of the core ideas of the ForgetIT Reference Model is to deviate from the general keepit all model, which makes the implicit assumption that all information has the same valuewith respect to being kept. In general the value of information is multifaceted and canbe considered from different perspectives, e.g., the short-term value for current activitiesvs. long-term value of a resource for an organization. For the combined information andpreservation management system, short-term value and long-term value of informationhas to be considered separately, since it is driven by different factors. For a review ofshort-term and long-term memory and forgetting see deliverables D2.2 [?] and D2.3 [?].

Short-term value The short-term value refers to the value of content for the current focusof activity, e.g., documents used for a task at hand are of high short-term value. Inshort-term value of information, we will see a high dynamics in the information value(due to changing interests and tasks) and a high influence on interaction-basedevidences on the information value. In terms of the human brain, this is roughlycomparable to the working memory (see Section 2.3), although human workingmemory has an even higher change frequency. Further details about foundationsof managed forgetting can also be found in deliverable D3.1 [?] and following WP3deliverables.

Identifying the short-term value of a document is of high interest for creating imme-diate benefit in information management, e.g. by de-cluttering the desktop, whichis one of the goals of the ForgetIT project. The idea here is to give higher priorityor visibility to resources with high short-term value. In the ForgetIT project the termMemory Buoyancy has been coined for this purpose. It is inspired by the idea ofresources of decreasing importance to the user are sinking away from the user.

Long-term value The long-term value of a resource is obviously relevant in the context of

c© ForgetIT Page 11 (of 51)

Page 12: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

preservation. It refers to the value a resource has on the long run. Such long-termvalue can be used to decide about the investment to be made into preservationfor the respective resource [?]. Long-term value is expected to be more difficultto compute, since it includes estimating future use of resources [?]. Furthermore,long-term value is driven by (at least partially) different factors than short-term value.It is expected that more objective values such as diversity, coverage and quality willplay a more important role here (see deliverable D9.3 [?] for further details aboutthe dimensions used for resource value). In the ForgetIT project, we have coinedthe term Preservation Value for the long-term value of a resource.

2.3 Brain-inspired: What can we Learn from Human Forgetting andRemembering?

Humans are very effective in forgetting irrelevant details and things no longer relevant.This enables them to focus on the relevant things and to efficiently make decisions in theircurrent life situations. This ability of the human brain inspired the idea of considering aform of forgetting for digital memories in the ForgetIT project. It is investigated, if similarmechanisms in digital memory might be helpful in better coping with the general situa-tion of information overload, i.e., helping to focus on the really relevant digital resources.However, if we just copy human remembering and forgetting, the digital memory wouldjust forget the same things as the human. Clearly, this is not the goal. Rather, the methodsin digital memory should learn from and complement the human memory: it is desirablethat a computer focuses on ”remembering” those things that the human might forget, butwhich are still useful or required later.

The idea of complementing human memory suggests to take an embracing perspectiveon human and digital memory, considering them as a joint system. In such a system,the two parts are not considered in isolation: their interactions and mutual influence aretaken into account as well. Figure 1 shows models of the human memory and a digitalsystem as well as interactions (joint system perspective). Further information can be findin deliverables D2.2 [?] and D2.3 [ForgetIT(2015a)]. In particular, the high-level model ofthe functioning of the human brain described above is depicted in Figure 1. The modelrepresented in Figure 1 is focused on the synergy between human and digital memory,and derived from the text of deliverables D2.2 [?] and D2.3 [ForgetIT(2015a)]. The humanmemory (on the left) and the digital memory (on the right) together contribute to a form ofvirtual memory a human can rely on.

On the side of the human memory three main type of memory are distinguished: workingmemory, episodic memory and semantic memory. Together with the currently activatedepisodic and semantic memory, the verbal short term memory (things just heard) andthe visual short term memory (things just seen) form the working memory, which framesthe current situation. Knowledge is activated on demand from the semantic and episodicmemory according to current needs via the so-called executive functions.

Page 12 (of 51) www.forgetit-project.eu

Page 13: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

Figure 1: Joint Perspective on Forgetful, Interacting Human and Digital Memory, Derivedfrom Material in D2.2 [?] and D2.3 [ForgetIT(2015a)].

Perception is one driver for such activation. It is worth noting that perceived signals donot directly become part of the verbal or visual short term memory, which are constantlyupdated, but they are rather filtered and interpreted by things already in the memory formaking sense of the perceived signals.

Similarly, we also foresee a working memory within the digital memory in our model. Thisis composed of the digital resources currently relevant, e.g., used or relevant for currenttasks or activities. The idea here is to clearly distinguish those resources from the rest ofthe digital resources and to keep them as close as possible to the user and easily acces-sible as possible (see also Memory Buoyancy). In an automated digital working memorysignals from resource usage, pattern of usage and change as well as relationships be-tween resources will be used to update the digital working memory. For this purpose,in our model, we introduce managed forgetting functions, which control the transitionsbetween the different parts of the digital memory.

Together, the working memory and the digital working memory form the virtual workingmemory. Clearly, there is an influence between both of them. In the ideal case, the digital

c© ForgetIT Page 13 (of 51)

Page 14: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

working memory would show to the user just all the information that the user needs in thecurrent situation, but does not have in her working memory. Note, that it is also possiblethat there is an influence of the way the digital memory works on the human (working)memory (joint system perspective). For example, with the easy storage of phone numbersin mobile phones, humans no longer have to remember phone numbers.

Managed forgetting functions are also used to identify content that is of long-term value(see Section 2.2) and should, therefore, be preserved. In Figure 1, we distinguish (a) infor-mation management for re-use, as it is, e.g., done on a desktop computer or a server and(b) the system for archival and preservation (on top of the Figure). When content is trans-ferred to Archival and Preservation, it makes sense to add context information to it (con-textualization). This prepares the content for re-contextualization, which is required , whenpreserved content is brought back at a (much) later time. The idea of re-contextualizationis to connect the re-activated content with the current environment or to - at least - makeit understandable in the current environment. The idea of re-contextualization as an ac-tive situation-dependent process of bringing back things ”stored” in the memory is againinspired from human memory: when we as humans remember things this is also a re-construction process, which depends upon the current situation. Further details aboutfoundations of contextualized remembering can be found in deliverable D6.1 [?] and fol-lowing WP6 deliverables.

Episodic memory is mainly a detailed storage of events. It is typically subject to fastforgetting as well as blurring between the memory of similar events due to interference.Here, digital memory complementing human memory, e.g. via photos, can serve as areminder of things that are forgotten, but that one might remember or refresh in a laterpoint in time, e.g., reminiscing about past events (see also the discussion in deliverableD3.1 [?]). For this purpose it is crucial to select the adequate content to preserve, so thatit can help remembering, e.g. considering coverage and diversity of preserved content.

Semantic memory is a more conceptual storage of memory, which stresses on patterns,abstractions and lessons learned. Here, the strongest interaction between human mem-ory and digital memory is that the organization of digital resources does or should reflectthe conceptualization of the world of the user, which is linked to her semantic memory.A more explicit modeling of the conceptualization of the world and a richer annotation ofresource with this knowledge in Information Management for Re-use and in the Archivaland Preservation (see Figure 1), as it is, for example, proposed by the Semantic Desk-top approach (see WP9), can ease navigation and search of the user and thus re-findingthings in the digital memory.

2.4 Forgetful: Focus on the Important Things

The ForgetIT project introduces the idea of an forgetful approach to information andpreservation management as an alternative to the dominating keep it all approach. Theforgetful approach opts for conscious decisions about what is important and thus shouldbe kept (and preserved) replacing the often random form of forgetting (or losing) informa-

Page 14 (of 51) www.forgetit-project.eu

Page 15: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

tion as it can be often found with the keep-it-all approach (e.g. disk crashes, obsolescenceof formats and technology, etc.).

Since preservation comes at a cost [?] [?] [?], it is important to make conscious decisionsabout what to preserve or how much to invest in the preservation of which part of theinformation space. For this need a forgetful approach is a good fit.

A forgetful approach is based on Information Value assessment, i.e. computing and pre-dicting the value of information resources. Value for different purpose can be considered.In the context of this Reference Model, short-term and long-term values are important(see also Section 2.2). Effective information value assessment, especially for long-terminformation value, is a complex task involving a variety of parameters and heuristics.Based on such value, preservation decisions can be taken. On a high level, these de-cisions could include choice of preservation provider and/or service as well as decisionsabout redundancy and transformation.

2.5 Evolution-aware: Embracing the Long-term Perspective

Since we are targeting integrated information and preservation management systems,we are operating in a long-lived context, covering a time perspective of several decades.Even things that are considered relatively stable in the current setting of an informationsystem - such as the type or class of content management system in use - will changeover time. For being prepared for sustainable operation it is important to be prepared forsuch changes.

It is one of the core ideas the ForgetIT approach to keep the important information acces-sible and usable even in case of large changes in the setting and context of operation.For incorporating evolution-awareness into our Reference Model, several types of evolu-tion with different impacts on the reference model have to be considered:

1. Changes in conceptual model of the Active System: this could be due, for ex-ample, to changes in the organizational ontology underlying the content structuringas well as processes described in the content. This creates a semantic gap be-tween the archived content (relying on the old implicit or explicit ontology) and theactive content (structure by new ontology). This gap has to bridged, at latest whenpreserved content is brought back into the active environment, in order to enablecorrect interpretation of the re-activated content;

2. Active system evolution and exchange (Migration): the used Active System mightbe subject to major changes or might even be completely replaced by another sys-tem, if we look at time frames of several decades. In spite of such changes thecontent should stay accessible and usable;

3. DPS evolution or exchange: in the same way, the chosen DPS might evolve orcould be exchanged over time. This implies the migration of content into a new

c© ForgetIT Page 15 (of 51)

Page 16: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

DPS. In the ideal case this should have as little impact on the Active System aspossible;

4. Change in best-practices and technology: formats as well as employed tech-nologies might become obsolete over time. This requires the identification of suchchanges as well as adequate actions to react to those changes, such as formattransformations.

The last item in the list above is a classical issue of any DPS. It is, therefore, not coveredin too much detail in our Reference Model , since we focus on the things that go beyondcurrent best practices in digital preservation.

Page 16 (of 51) www.forgetit-project.eu

Page 17: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

3 Preserve-or-Forget Reference Model

Following the ForgetIT approach, we are aiming for a smooth transition between active in-formation use in the respective information management system (environment), which wecall Active System, and the Digital Preservation System (DPS) (the PoF Framework archi-tecture is discussed in deliverable D8.1 [ForgetIT(2014b)]). Therefore, in the Preserve-or-Forget (PoF) Reference Model we are considering active information use and preservationas part of a joint ecosystem, which stresses the smooth transitions and the synergetic in-teractions rather than the system borders. This also is a core distinction from the OAISReference Model [CCSDS(2012)], which is restricted to the DPS.

In Section 4, we discuss the mapping of the PoF Reference Model to the ForgetIT ar-chitecture. A possible mapping of this joint ecosystem onto three separate systems, i.e.the Active System (in the form of adapters and/or system extensions), the DPS (typicallyOAIS based) and the PoF Middleware, which couples both system in a flexible way is de-scribed in next Sections (see Section 4 for the mapping to the PoF architecture, Section 6for the relationship with OAIS and Section 5 for the extensions of the Active System).

In describing the PoF Reference Model, we took inspiration from the way the OAIS Ref-erence Model is structured. In this Section, we describe the functional view of the PoFReference Model (functional model) and provide some preliminary ideas for the informa-tion model of the PoF Reference Model.

3.1 Functional Model: Layers, Workflows, Functional Entities

As in the case of the OAIS model, the function view of the PoF Reference Model, orthe functional model for short, considers the main functional entities of the proposedreference model. Furthermore, we also describe the main workflows in the model andhow the functional entities contribute to those workflows. Again the stress is on the parts,which connect the two types of systems, Active System and DPS, with each other.

The proposed PoF Functional Model is made up of three layers, namely the Core Layer,the Remember & Forget Layer and the Evolution Layer:

• the Core Layer considers the basic functionalities required for connecting the ActiveSystem and the DPS;

• building upon this layer, the Remember & Forget Layer introduces the brain-inspiredand forgetful aspects into the PoF Reference Model implementing more advancedfunctionalities for the preservation preparation and the re-activation workflow;

• finally, the Evolution Layer, is responsible for all types of functionalities dealing withlong-term change and evolution such as implementing the idea of contextualizedremembering.

c© ForgetIT Page 17 (of 51)

Page 18: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

The different workflows and functional entities in the PoF Functional Model are associatedto the three model layers above, as summarized in Table 1 and depicted in Figure 2. Thedescription of the layers, workflows and functional entities is provided in the next Sections.

Figure 2: High Level Functional View of the PoF Reference Model

An overview of the PoF Functional Model components (layers, workflows, functional enti-ties) is depicted in Figure 2): within each layer box the relevant entities and workflows areshown. In the following Sections we provide a more detailed representation of each work-flow, with the steps associated to each process and the involved entities. It is worth notingthat Figure 2 already makes some assumptions about the functionalities implemented inthe Active System and, especially, the DPS: those functionalities, which are parts of oneof the respective systems, are not explicitly listed in the PoF Reference Model. For ourpurpose, we assume a OAIS compliant DPS implementing functionalities such as Ingest,Data Management, Preservation Planning, Archival Storage and Access of preservedcontent (see [CCSDS(2012)]).

The three layers are used in the following to describe the functional view of the PoFReference Model in more detail.

Page 18 (of 51) www.forgetit-project.eu

Page 19: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

Model Layers Workflows Functional EntitiesCore Layer Preservation Preparation (basic)

Re-activation (basic)ID ManagementExchange Support

Remember & ForgetLayer

Preservation PreparationRe-activation

Content Value AssessmentManaged Forgetting & AppraisalDe-contextualizationContextualizationRe-contextualizationSearch & NavigationMetadata Management

Evolution Layer Situation ChangeSetting ChangeSystem Change

Evolution MonitoringContext Evolution ManagementContent Value Re-assessmentContext-aware Preservation Management

Table 1: PoF Reference Model Components: Layers, Workflows and Functional Entities.

3.1.1 Core Layer

The Core Layer embraces basic forms of two workflows connecting the Active Systemand the DPS, the Preservation Preparation workflow and the Re-activation workflow,and includes two functionalities, ID Management and Exchange Support (see Figure 3).The workflows and the functional entities are described in the following.

Figure 3: Core Layer of the PoF Reference Model

The Preservation Preparation workflow takes care of transferring content to be preservedbetween the Active System and the DPS. On an abstract level, this workflow - in its basicform - consists of five steps (see also Figure 4) (1) select the content to be archived, (2)

c© ForgetIT Page 19 (of 51)

Page 20: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

provide the content to the archival process (3) enrich the content with context for preser-vation (4) package the content according to the expectations of the DPS (5) transfer thecontent into the DPS. In terms of preservation terminology, the preservation preparationworkflow can be considered a pre-ingest workflow, which leads into the ingest functional-ity of the DPS. This interaction is discussed in more detail in Section 6. Furthermore, it isworth noticing that the enrich functionality is available on the Core Layer only in its basicform (e.g. to add technical information such as file format to the content to be archived).More advanced enrich functionality is discussed for the Remember & Forget Layer.

Figure 4: Preservation Preparation Workflow (Basic) in the Core Layer

The Re-activation workflow takes care of enabling the Active System to retrieve and re-activate content, which has been transferred to the DPS. Again on an abstract level,this workflow - in its basic form - consists of five steps (see also Figure 5): (1) requestthe content to be retrieved from the PDS (here via its identifier), (2) search requestedcontent thus translating the request into archival ID(s) (in the basic workflow just usingthe ID Manager) (3) fetch the respective content from the DPS (4) prepare the contentfor delivery to the Active System, (4) transfer content to the Active System. The link tothe DPS is for the Core Layer mainly in the fetch activity, which interacts by the accessfunctionality offered by the DPS (see also Section 6).

As in the case of the Preservation Preparation workflow, some of the steps in the Re-activation workflow are only included in a very basic form on the Core Layer and areextended with more advanced functionalities on the Remember & Forget Layer.

In support of these two core workflows in their basic form the Core Layer includes twotypes of further functionality, namely ID Management and Exchange Support:

ID Management The ID Management is mainly responsible for mapping between theIDs of the resources in the Active System and IDs that are used in the DPS foridentifying (and locating) the respective content resources. Since several versionsof the same resource can be put in the DPS, the ID Management also has to takecare of resource versions and their mappings to archive IDs.

Page 20 (of 51) www.forgetit-project.eu

Page 21: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

Figure 5: Re-activation Workflow (Basic) in the Core Layer

Exchange Support Exchange support is responsible for enabling the exchange of con-tent and metadata between the Active System and the DPS. It adapts and maintainsprotocols for this purpose. The Exchange Support handles both outgoing informa-tion and incoming packages that should be put back into active use. The ExchangeSupport can be considered as a client-side communication adapter and can, forexample, be implemented in the form of a repository.

3.1.2 Remember & Forget Layer

The Remember & Forget Layer introduces brain-inspired functionality into the PoF Ref-erence Model, which targets the concepts of managed forgetting and contextualized re-membering. For this purpose, the Remember & Forget Layer extends the two workflowsPreservation Preparation and Re-activation from the Core Layer with further, more ad-vanced functionalities: Content Value Assessment, Managed Forgetting & Appraisal,De-contextualization, Contextualization, Re-contextualization and Search & Navi-gation, which are all described in the following. All of those listed functionalities createadditional metadata, which have to be managed in a systematic way. Therefore, the Re-member & Forget Layer also contains a functional entity for Metadata Management.

In more detail, the Preservation Preparation workflow, which still consists of the five stepsselect, provide, enrich, package and transfer (see Figure 6), now uses the additionalfunctionalities of Content Value Assessment and Managed Forgetting in the phase ofselecting content for preservation:

Content Value Assessment Understanding the value of content is in the core of con-tent appraisal for preservation and managed forgetting. Content value assessment

c© ForgetIT Page 21 (of 51)

Page 22: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

Figure 6: Preservation Preparation Workflow in the Remember & Forget Layer

aims to determine the value of a resource. This value may change over time andthere are different value dimensions, which reflect the value considering differentpurposes or perspectives and which may influence each other. There is, for exam-ple a value dimension reflecting current importance, e.g. Memory Buoyancy (MB),and a dimension reflecting the long term importance or relevance of a resource, thePreservation Value (PV). For assessing content value, the content value assess-ment component takes evidences from the Active System, e.g. about informationuse, content creation, and further knowledge about the role of resources in theActive System. Content value can be used as a basis for making preservation deci-sions, e.g. if a resource should be preserved or how much should be invested in thepreservation of a resource. Content value can also be used in the Active System,e.g. for especially highlighting resources with high content value.

Managed Forgetting & Appraisal With the dramatic growth of the amount of content,nowadays it becomes more and more important to make conscious decisions aboutpreservation. Clear decisions on what to put into the DPS and explicit content ap-praisal have always been part of the processes of an archive [?], although not alwaysas much in personal archiving [?]. The component for appraisal and managed for-getting aims to help in automating such decisions, a need that has been identifiedearlier [?], for both personal archiving as well as organizational settings. This isencapsulated in the concept of managed forgetting, which uses the results of con-tent value assessment for deciding about preservation and forgetting actions. Theeffects of managed forgetting functionality is not restricted to the preservation func-tionality. It can also be used in the Active System for improved information access.

Furthermore, the workflow steps provide and enrich are extended with De-Contextualizationand Contextualization functionality, respectively:

De-Contextualization De-Contextualization refers to the extraction of an object from itsActive System context in preparation of packaging it for archiving. Decoupling the

Page 22 (of 51) www.forgetit-project.eu

Page 23: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

object under preservation from its Active System context is non-trivial, since it hasto be decided how much of its current context has to be taken for its future contex-tualization and where a cutting can and should be made. De-contextualization andcontextualization (see below) are conceptually closely related.

Contextualization Contextualization consists in providing sufficient additional informa-tion for the content to be preserved, in order to allow archived items to be fully andcorrectly interpreted at some undefined future date. This entity is responsible fordefining and assigning the appropriate context to content to be archived. Contextu-alization can leverage other processes (similarity analysis, concept detection) to ex-plicate context. Contextualization provides the basis for the management of contextevolution over time (see Evolution Layer in Section 3.1.3) and Re-contextualization(see Re-activation workflow in Figure 7).

The Preservation Preparation workflow is linked to the Pre-Ingest functionality as it is de-scribed for Preservation Systems, e.g. in the OAIS model. In order to facilitate easy(seamless) ingest into the DPS and make sure that the packages contain metadataneeded for both the DPS as well as for access, pre-ingest aids the Active Systems aswell as the DPS systems adhering to standard protocols and metadata. This also meansthat the Pre-ingest function puts up some requirements on the Active Systems to followcertain protocols (which can/should be domain specific). Our Preservation Preparationworkflow, from the perspective of the preservation system serves as a rich pre-ingestfunction.

In general, before the DPS process of transferring digital collections from Active Systemto DPS, a submission agreement has to be established between the participating orga-nizations, following a standard approach, as described in the Producer-Archive InterfaceMethodology Abstract Standards (PAIMAS) [?].

The agreement should contain accurate information about package content, structure,and metadata. It should also include requirements for security and privacy mechanismsat transfer and storage. There should be stated in the agreement if there is a need formigration at ingest. Other examples is a specification to what extent metadata should beobtained and generated during the pre-ingest process and if there is specific demand onstorage. In the case of the PoF, this agreement is mainly reduced to an agreement on in-terfaces between the Active System and the Remember & Forget Layer. In addition, thereare however, still needs for agreements e.g. about security and privacy requirements.

For the Re-activation workflow, two types of additional functionality are used, with therespect to the Core Layer, namely Re-contextualization and Search & Navigation:

Re-contextualization The purpose of the Re-contextualization functional entity is to sup-port the interpretation of a content object at the time of access (which might be aconsiderable time after archival). Re-contextualization occurs when a documentis retrieved from the DPS at some future date. Once retrieved from the DPS and

c© ForgetIT Page 23 (of 51)

Page 24: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

Figure 7: Re-activation Workflow in the Remember & Forget Layer

before it is put back into active use, the context information, which has been pro-vided by the contextualization functionality, stored together with the content objectand possibly updated or extended over time is retrieved. This context information isused for Re-contextualization, i.e. to relate the content object to the current usagecontext. Re-contextualization can also include the re-construction or extension ofcontext information for content archived with no or not sufficient original context.

Search & Navigation The Search & Navigation functional entity is responsible to enablefinding things that have been preserved. Various types of search and navigationwill be supported here. This includes search in the metadata, full-text search inthe content (or more general content-based search also including non-textual con-tent), search in the context information and in other types of annotation, exploratorysearch for understanding the archive content, etc. What is crucial for the integra-tive and forgetful approach we are following here, is (a) to manage the interactionbetween the search in the Active System and the search in the DPS and (b) tounderstand how the forgetful approach and search support interact. Since we arefollowing an integrative approach, it makes sense for (a) to consider the informationin the Active System and the DPS as a type of a joint virtual information space,which are both considered for search in the Active System. However, it might stillmake sense to differentiate the two types of content taking into account the costthat might be attached to accessing content from the DPS. Archive content might,for example, only be considered on demand or if nothing can be found in the Ac-tive System. Furthermore, content stemming from the preservation store might bemarked in result lists. For aspect (b), the influence of the forgetful approach, theresults of content value assessment, namely MB and PV, can be considered in re-sult ranking (or even indexing): this would prefer results with higher content valuebalancing content value and relevance as it is for example done in diversificationapproaches. Furthermore, this includes adequate filtering and ranking approaches

Page 24 (of 51) www.forgetit-project.eu

Page 25: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

for handling versioned archive content.

Finally, in both workflows metadata are generated and used for different purposes. Thismetadata is taken care of by the functional entity for Metadata Management:

Metadata Management The Matadata Management functional entity is responsible forthe different kinds of metadata that are created and exchanged by the aforemen-tioned functional entities. This includes a variety of different types of metadata suchas current and past values for memory buoyancy and preservation value, informa-tion on the context of a resource, information extracted from resources for furtherprocessing as well as indexing information for search and navigation. Some of themetadata, which is collected here just remains in this middle layer for supportingits operation. Other parts of the metadata such as the preservation value and thecontext information will be stored as part of the archived object in the preservationsystem. The Metadata Management of the Remember & Forget Layer clearly in-teracts with the respective components of the preservation system by (a) providinginput for enriching the metadata in the preservation system for improved preserva-tion management, (b) incorporating information coming from the preservation sys-tem (e.g. for the joint indexing) and (c) by - as mentioned before - storing some ofthe metadata as part of the resources to be preserved.

3.1.3 Evolution Layer

The Evolution Layer contains three new workflows: the Situation Change workflow isresponsible for monitoring semantic changes in the Active System and to propagate theminto the DPS, in order to keep the preserved content understandable; the Setting Changeworkflow deals with changes in practices, formats and technology in the environment ofthe preservation (setting); finally, the System Change workflow is responsible for situa-tions, where one of the involved systems changes. The first two workflows are describedin more detail below, whereas the last workflow is left to the revised version of the PoF Ref-erence Model which will be reported in deliverable D8.5 [ForgetIT(2016)]. In support of theaforementioned workflows, the Evolution Layer includes additional functionalities: Evolu-tion Monitoring, Context Evolution Management, Content Value Re-assessment andContext-aware Preservation Management.

The Situation Change workflow consists of four steps (as depicted in Figure 8): (1)change monitoring, (2) change assessment (assessment of detected changes), (3)change notification (notification of involved components as well as the DPS on rele-vant changes) and (4) change propagation, which performs different types of actionsdepending on the observed change and the chosen change propagation strategy.

For monitoring, assessing the changes and deciding about the consequences, in supportto the Situation Change workflow, the functionalities Evolution Monitoring (mainly part ofActive System), Context Evolution Management and Content Value Re-assessment havebeen introduced in the Evolution Layer:

c© ForgetIT Page 25 (of 51)

Page 26: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

Figure 8: Situation Change Workflow in the Evolution Layer

Evolution Monitoring The Evolution Monitoring functionality is required for the changemonitoring step in the Situation Change workflow. This is mainly performed in theActive System (extension), because this is the main place, where evolution in theconceptual model of the application become visible. Such changes might, for ex-ample, be changes in the ontology (implicitly) underlying the organization of theinformation triggered e.g. by a major re-organization. Evolution Monitoring has toobserve changes in the explicit representations of the conceptual model (such astaxonomies, information structures) in the Active System as well as more implicitsignals of context evolution (e.g. newly upcoming topics, tags getting out of use,department sites no longer updated).

Context Evolution Management The Context Evolution Management is responsible forkeeping up-to-date the context information which has been stored along with thearchived content. This may include the storage of further context information, incase of larger changes in the Active System (e.g. major restructuring of an organi-zation). This also includes keeping track of such larger changes, which have hap-pened in the Active System on a conceptual level. Such a change history can beused in Re-contextualization for making the content understandable in the changednew context.

Content Value Re-assessment The functionality Content Value Re-assessment servesa very similar purpose as the functionality Content Value Assessment described forthe Remember & Forget layer. It re-visits the value assessments originally providedby the Content Value Assessment based on observed evolution in context. It isconsidered separately in our model, since the resources that it works on are nowalready preserved, which has implications on the computation of the value (e.g. therole of usage data) as well as on where this functionality is performed. One option,which is followed in the ForgetIT project is to map it to in storage computation.

Page 26 (of 51) www.forgetit-project.eu

Page 27: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

The Setting Change workflow consists of four different phases with two different startingpoints (as depicted in Figure 9): the activity monitoring which (1.1) logs the bi-directionalcommunication between the Active System and DPS and (1.2) receives a change requestfrom a DPS, (2) the change assessment that detects and triggers change requests, (3)the change estimation of suitable change recommendations based on content value,purpose of use, and use statistics, (4) the change recommendation, namely the notifi-cation of recommended actions to DPS, which could be of different types, such as changeof content or change of physical and logical content structure.

Figure 9: Setting Change Workflow in the Evolution Layer

The Evolution Layer includes the functional entity Context-aware Preservation Manage-ment, supporting the workflow described above:

Context-aware Preservation Management The Context-aware Preservation Manage-ment functional entity extends OAIS Preservation Planning and OAIS Administrationby keeping track of Active Systems as well as the (several) DPS that are involved.The main idea here is that preserved information should be easy to seamlessly putback into active use in either the same system as it originates from, but even moreimportantly other information systems (of the same type). Even the same systemmight have evolved to newer versions with different standards, and maybe even adifferent information structure or ontology. This must also be tracked by the Context-aware Preservation Management.

The above functionality is closely linked to typical OAIS functionalities of the underlyingPreservation System, such as Preservation Planning and Administration. PreservationPlanning is an OAIS functional entity residing in the DPS, that is responsible for devel-oping preservation policies and requirements by monitoring technological changes and

c© ForgetIT Page 27 (of 51)

Page 28: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

needs and requirements from the user community. Further information is available inSection 6, where the mapping of the PoF Reference Model is discussed.

The functional entity (Preservation) Administration provides the ”spider in the net” ofthe DPS, handling contracts regarding e.g. submission agreements, and taking decisionson new policies and standards for the DPS. It is also in charge of managing softwareand hardware configurations and the overall DPS operations on a daily basis. The Ad-ministration functional entity receives input from all other functional entities in the OAISmodel. Part of this functionality needs to be externalized since the PoF Middleware (seeSection 4) needs to keep track of several systems and adapt functionality for the seam-less interaction/integration between Active Systems and DPS, and also to give input toseveral DPS. Finally, another OAIS functionality related to Context-aware PreservationManagement is Data Management, which manages OAIS internal data and providessearch metadata to the Access function. This function also manages internal integrity inthe DPS. Part of this functionality needs to be external to the DPS in order to facilitatePoF functionality, including support for using several different DPS.

3.2 Information Model: Core Elements

In this Section, we provide some preliminary ideas for the definition of an informationmodel as part of the PoF Reference Model, following the approach used in OAIS. Theinformation model again focuses on the needs of bridging the gap between the ActiveSystem and the DPS. In order to meet the requirements of a variety of systems on thesite of the Active System as well as on the site of the DPS, the information model hasbeen kept quite generic in this reference model.

The core elements of the information model are represented by the UML diagram de-picted in Figure 10, where we included the main types used to represent the objects ofpreservation.

The core element of the information model is an Archival Object (see Figure 10), whichis made up of a Content Object and a Context Object. Content Object andContext Object are linked to each other by a gives context link.

Both Content Object and Context Object are of type Archival Resource, whichis described by a Metadata type and is represented by an Information Object type.An Information Object can be a Single Object such as a photo, a file, or a conceptor it can be a Complex Object. An important type of Complex Object is a Collec-tion of objects that are to be archived together. A second type of Complex Object isan Object Graph of interlinked Information Object types.

Context is represented by either a Local Context or a World Context type. TheLocal Context has to be stored in the PDS, because it can, in future, not be easilyfound elsewhere. World Context is context information that is expected that also infuture can be found somewhere else and, thus, it might be sufficient to store only some

Page 28 (of 51) www.forgetit-project.eu

Page 29: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

Figure 10: Abstract Information Model of the PoF Reference Model

references to keep this context information. See deliverable D6.1 [?] for a more detaileddiscussion of this issue.

The information model drafted here will be further refined and described with all details inthe next version of the PoF Reference Model, in deliverable D8.5 [ForgetIT(2016)].

c© ForgetIT Page 29 (of 51)

Page 30: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

4 Mapping to ForgetIT Architecture

In the following, we describe how the PoF Reference Model can be mapped to the ar-chitecture of the PoF Framework, described in deliverables D8.1 [ForgetIT(2014b)] andD8.3 [ForgetIT(2014c)]. Please note that some organizational aspects (such as PoF Mid-dleware ownership) are not discussed here, they will be included in the final version of thePoF Reference Model, in deliverable D8.5 [ForgetIT(2016)].

4.1 PoF Framework Architecture

The architecture of the PoF Framework, described in detail in deliverable D8.1, is made upof three layers: Active Systems, Preserve-or-Forget (PoF) Middleware and Digital Preser-vation System (DPS), as depicted in Figure 11. The Active Systems represent user appli-cations, while the PoF Middleware is intended to enable seamless transition from ActiveSystems to the DPS (and vice versa) for the synergetic preservation, and to provide thenecessary functionality for supporting managed forgetting and contextualized remember-ing. The PoF Middleware provides also the integration framework for all componentsdeveloped in WP3-WP6. The DPS is composed by two sub-systems (Digital Reposi-tory and Preservation-aware Storage, including a Cloud Storage Service) and providesboth content management and typical archive functionalities required for the synergeticpreservation.

The deliverable D8.1 also contained a preliminary discussion about the role of the OAISReference Model in the overall architecture: according to the project proposal, since OAISnowadays is the most recognizable conceptualization of a digital preservation system, itwas considered as one of the building blocks of ForgetIT approach. However, the modeldescribed in Section 3 complements and supersedes this initial approach towards theOAIS model. We further discuss the relationship with OAIS in Section 6.

4.2 Relationship between PoF Reference Model and PoF Architec-ture

We provide the mapping between each functional entity in the PoF Reference Model andthe architectural components of the PoF Middleware in Table 2. The list of componentsis taken from deliverable D8.1. It is worth noting that for some components the mappingwith model entities is not one-to-one, because more than one component can participatein the implementation of a given functional entity of the model.

It is worth noting that the Scheduler component is not explicitly mentioned in Table 2,because it mainly provides process management functionalities supporting the differentworkflows across the three layers described before.

Page 30 (of 51) www.forgetit-project.eu

Page 31: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

Functional Entity Model Layers PoF Middleware Compo-nents

ID Management Core, Remember &Forget

ID Manager

Exchange Support Core, Remember &Forget

Collector, Archiver,Metadata Repository

Content Value Assessment Remember & Forget ForgettorManaged Forgetting & Appraisal Remember & Forget ForgettorContextualization Remember & Forget Contextualizer, Ex-

tractor, CondensatorDe-contextualization Remember & Forget ContextualizerRe-contextualization Remember & Forget Contextualizer,

ArchiverSearch & Navigation Remember & Forget NavigatorContext Evolution Management Evolution Context-aware

Preservation Man-ager, Contextualizer

Context-aware Preservation Man-agement

Evolution Context-awarePreservation Man-ager

Preservation Planning Evolution Context-awarePreservation Man-ager

Administration Evolution Context-awarePreservation Man-ager

Pre-ingest Evolution Archiver

Table 2: Mapping between PoF Reference Model Functional Entities and the PoF Middle-ware Components.

The PoF Functional Model described in Section 3.1 includes several workflows spanningacross the three model layers:

• the Preservation Preparation workflow (Figure 4 and Figure 6) for the Core Layerand for the Remember & Forget Layer;

• the Re-activation workflow (Figure 5 and Figure 7) for the Core Layer and for theRemember & Forget Layer;

• the Situation Change workflow (Figure 8) for the Evolution Layer;

• the Setting Change workflow (Figure 9) for the Evolution layer.

The different steps of such workflows involve different PoF Framework components, mainlyfor what concerns the PoF Middleware. We provide a representation of the activation of

c© ForgetIT Page 31 (of 51)

Page 32: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

the different components in each workflow in Figure 12, Figure 13, Figure 14 and Fig-ure 15, respectively.

It is worth noting that some components are involved only in one of the layers, e.g. theRemember & Forget Layer or the Evolution Layer (see Table 2) and that Figure 15 aboutthe Setting Change workflow includes also the Active System and Preservation Systemcomponents, while the others involve only the components within the PoF Middleware.

4.3 Model Implementation in PoF Architecture

The main challenges in implementing the PoF Reference Model are associated to (a)the integration of the components within the PoF Middleware, (b) the integration of theActive Systems and of the Preservation Systems with the PoF Middleware, and (c) theimplementation of some specific components which are crucial for the new ForgetIT ap-proach to digital preservation described in this model. Examples of such componentsare the Forgettor, the Contextualizer and the Context-aware PreservationManager, just to name a few.

The integration of the components in the PoF Middleware leverages the outcomes of WP5,detailed in deliverables D5.1 [?] and D5.2 [ForgetIT(2014a)], which described the foun-dations of synergetic preservation and a workflow model and prototype for the transitionbetween Active System and DPS. The communication among the components and thebusiness logic for the different workflows are implemented in the PoF Middleware usinga Message Oriented Middleware (MOM) approach [?], powered by a rule-based enginewhich activates the different components according to specific Enterprise Integration Pat-terns (EIP) [?], based on best practices in the field of Enterprise Application Integration(EAI). The choice of the most suitable solution is performed according to ForgetIT re-quirements and also taking into account the future adoption of the PoF Framework (seedeliverable D8.3). The adoption of any middleware technology requires some effort bysoftware architects and engineers before it can be used in a effective way. The integrationof the Active Systems and the Preservation Systems should be based on standard tech-nologies and robust APIs, to enable the integration with different preservation solutionsand user applications.

In the previous Section we described how the different components of the PoF Frame-work are related to the PoF Reference Model functional entities. The final release of thePoF Framework will be fully compliant to the model described here. The first releaseof the prototype described in deliverable D8.3 already integrates many components andimplements two priority workflows for basic synergetic preservation and managed forget-ting support, as described in Section 4.2 of deliverable D8.1. The second release of thePoF Framework, described in deliverable D8.4 [ForgetIT(2015b)], will be further improvedaccording to the model.

The main challenge in implementing PoF Framework components is related to the nov-elty of the approach and to the lack of similar technologies or applications. Examples

Page 32 (of 51) www.forgetit-project.eu

Page 33: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

of such components are the Context-aware Preservation Manager (synergeticpreservation), the Forgettor (managed forgetting) and the Contextualizer (contex-tualized remembering), just to name a few which are closely related to the core ForgetITprinciples. The development of such components is based on a iterative approach, withnew requirements identified within the project as a result of the analysis and discussionamong all partners, resulting in the identification of the main processes and the relevantscenarios to be implemented. Moreover, the development of the Active Systems for thepersonal and organizational scenarios has to provide a proof of the ForgetIT approachvalidity through the implementation of application pilots. Other components in the PoFMiddleware which have not been mentioned above are crucial for the implementationof the relevant workflows, providing the required input for preserve-or-forget decisions,dealing with the core preservation objects in the information model, managing metadata,logging processes, etc.

c© ForgetIT Page 33 (of 51)

Page 34: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

Figure11:

Architecture

Diagram

oftheP

reserve-or-Forget(PoF)Framew

ork.

Page 34 (of 51) www.forgetit-project.eu

Page 35: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

Figure 12: Mapping between the PoF Middleware Components and the Preservation Prepa-ration Workflow.

Figure 13: Mapping between the PoF Middleware Components and the Re-activation Work-flow.

c© ForgetIT Page 35 (of 51)

Page 36: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

Figure 14: Mapping between the PoF Framework Components and the Situation ChangeWorkflow.

Figure 15: Mapping between the PoF Framework Components and the Setting ChangeWorkflow.

Page 36 (of 51) www.forgetit-project.eu

Page 37: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

5 Information Management Systems Extensions

The goal of the ForgetIT approach is to keep the impact of introducing preservationinto the information management workflow as small as possible. Besides the long-termpreservation of selected content - which is the aim - the approach also has the potentialto introduce other more immediate benefits into active information management.

Both for the basic functionality of supporting preservation as well as for leveraging thebenefit enabled by the approach, some extensions are required in the Active System.These are discussed in more detail in Section 5.1, while Section 5.2 elaborates on thebenefit that can be created in the Active System. Finally, Section 5.3 focuses on preser-vation strategies, discussing different interactions of the Active System with the DPS.

5.1 Extensions of the Active System

Extensions to the Active System are required, where it has to interact with a DPS (possiblyvia a middleware as in the case of ForgetIT architecture) and where information has to beprovided for the targeted intelligent preservation processes. This includes the collectionof evidences for information value assessment and the collection of information in supportof contextualization.

5.1.1 Supporting Information Exchanges

A core functionality, which needs to be enabled for synergetic preservation is informationexchange between the Active System and the DPS. Information to be exchanged includesthe content to be preserved as well as metadata and context information describing thiscontent. Furthermore, it has to be possible to bring content from the DPS back into theActive System (see Re-activation in Section 3.1.1). Thus, bi-directional information ex-change has to be enabled. This can be enabled for example by a repository used by bothsides for making content available to the respective other system (plus possibly a notifi-cation channel). This approach is investigated in WP5 deliverables (see for example de-liverable D5.1 [?] and D5.2 [ForgetIT(2014a)]). The approach adopted in ForgetIT makesuse of a standard-based repository leveraging the content exchange standard CMIS [?],which enables the interaction with different Active Systems. Besides these asynchronouschannels, more synchronized forms of information exchange are also possible, such asdirect service call.

5.1.2 Information Value Evidences

The Managed Forgetting & Appraisal function described in Section 3.1.2 heavily dependson the idea of content value assessment, which is discussed mostly in WP3 deliverables

c© ForgetIT Page 37 (of 51)

Page 38: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

(see for example deliverable D3.3 [?]). This is true for assessing short term importanceas it is done for computing Memory Buoyancy (MB) as well as for assessing the long termvalue, named Preservation Value (PV).

For substantial content value assessment, evidences have to be collected from the ActiveSystem. For the short-term importance these are for example information about the us-age pattern of a resource as well as information about the relationship between resources.In order to provide such evidences, specific interfaces as well as protocols are required,which define which evidences are provided, in which format and in which frequency. Fur-thermore, methods for collecting such evidences have to be implemented in the ActiveSystem, if not yet available. In the opposite direction, the Active System might also profitfrom the computed content value information and use it for advanced functionalities andgenerate short-term benefits (see Section 5.2).

5.1.3 Supporting Contextualization

Context information added to the preserved content is meant to ease the interpretation ofsuch content in case of re-activation. The information contextualization is investigated inWP6 (see for example deliverables D6.1 [?]). Context information can be gained in differ-ent ways, since (a) it can be provided by the Active System at the time content is sent tothe DPS, (b) the PoF Middleware can automatically extract information from the providedcontent and other sources such as e.g. a domain specific ontology or external knowledgesources (e.g. Wikipedia) and (c) it can be a mix of the previous two approaches.

If this is possible in the considered Active System, harvesting context information which isalready explicated in the Active System (option (a) above) looks more promising. In thisway a richer and more quality-controlled form of context can be provided, with respectto what can be automatically extracted in the preservation process. For example in theSemantic Desktop (see WP9) content is already annotated using a ontology, i.e. the Per-sonal Information MOdel (PIMO). This annotation, obviously, is a good source of contextinformation for preservation.

However, option (a) also puts higher requirements on the Active System: (1) explicatedcontext has to be available (or it has to be explicated for this purpose) and (2) the ActiveSystem has to be extended with a functionality that is able to attach context informationto the content information sent for preservation.

5.2 Creating Benefit in the Active System

Investment in preservation is typically paying back on the long run only. In order to fosterthe adoption of preservation technology, it is one of the goal of the ForgetIT approach toalso create short term benefit in the Active Systems - as a kind of positive side-effect ofintroducing preservation technology. This Section summarizes some ideas on how suchside-effects can be created in the Active System based on the ForgetIT approach.

Page 38 (of 51) www.forgetit-project.eu

Page 39: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

5.2.1 Forgetful Information Presentation

The functionality for Managed Forgetting & Appraisal and the related Content Value As-sessment (see Section 3.1.2) can be used to distinguish between content, which is ofcurrent importance from other content (see discussion above for MB). The values of MBcan be used in the Active System to bring the content currently important ”closer to theuser”, e.g. by showing it on the desktop or in special lists, having it on mobile devices, orby preferring search results with high MB, the so-called forgetful search. The core idea isto ease access to the things that are currently important: this is related to one of the fivecharacteristics of the PoF Reference Model (brain-inspired), as discussed in Section 2.3where we describe the digital working memory. This would be very similar to the humanworking memory, thus helping the user to focus on current activities. Examples for imple-mentations of such forgetful information presentation can be found in the deliverables ofWP9, most notably in deliverable D9.3 [?].

5.2.2 Forgetful and Archive-aware Search

Search is one of the core content access methods. Search & Navigation functionality(see Section 3.1.2) in the Active System can be affected in two ways by the introduc-tion of preservation technology, as described in the following. The most obvious way isto smoothly integrate the archived content into the Active System search functionality,i.e. archive-aware search. This idea has already been discussed in more detail in Sec-tion 3.1.2, within the Active System extensions in support of archive-aware search haveto be implemented. As a second way of modifying search in order to benefit from theForgetIT approach, forgetful search (see above) can be introduced. The idea here is totake MB into account in the ranking function, thus preferring resources of relevance forthe current task in the search result list.

5.2.3 Creating Awareness for Content Value

On a more conceptual level, the idea of content value assessment as it is used for com-puting MB and PV in managed forgetting can also be used to raise the awareness forthe value of content assets. An improved understanding of the value of content, which isbased on a variety of factors such as investment, usage, popularity, etc. can become animportant building block for next generation Content Management System (CMS) appli-cations. This idea is discussed in more detail in the deliverables of WP10.

5.3 Preservation Strategies

When preservation is introduced into the content management life-cycle of an Active Sys-tem, a variety of decisions have to be taken defining the preservation strategy to be used.

c© ForgetIT Page 39 (of 51)

Page 40: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

This includes decisions about when to preserve and about the granularity of preserva-tion. Furthermore, the interaction between resource versioning and the preservation of aresource has to be defined.

Preservation actions can be triggered in different ways. They can for example be activatedby the content management life-cycle: resources might be considered for preservation,when they go out of active use (low MB) or already upon creation or import into the system(e.g. for very valuable resources).

Furthermore, preservation can be scheduled, for example, by queuing all resources abovea predefined PV threshold for preservation on a regular basis. Finally, it is of coursealso possible to manually trigger preservation actions for individual resources or resourcecollections.

It depends on the type of resources considered as well as on the level of control theuser wants (or needs) over the preservation process, which of the options are best suitedfor an Active System under consideration. The options chosen influence the way thepreservation process is integrated into the Active System beyond enabling the transfer ofcontent to be archived.

Besides deciding when to preserve, it is also necessary to decide what to preserve. Thiscan be considered along two related dimensions. First, it is possible to either preserveindividual resources or entire collections of resources (or other types of complex objectssuch as sets of related resources) as one unit of archival. Second, resources can bepreserved in isolation or together with context, which describes them. This second point isclosely related to the work on contextualization in WP6 and the results affect the definitionof the archival objects, the basic units of the PoF Information Model (see Section 3.2).

The choices with respect to granularity of preservation has consequences for the transferprotocols between Active System and DPS. In addition, it might require methods for se-lecting (extracting/collecting) relevant context information for a resource to be preserved.

An interesting further aspect of the preservation strategy is to think about the model ofco-existence between the copy of the resource in the archive and the resource in the Ac-tive System (if the strategy allows for such a co-existence). This has some implications,when the resource is changed in the Active System, after a copy has been archived. Typ-ically, newer versions will not overwrite the archived version. Rather, the changed versionwill be archived as a new version. There are, however, decisions to be made about, ifand when the updated version is put (automatically) into the archive. As already statedabove, this will provide input to the definition of the PoF Information Model in deliverableD8.5 [ForgetIT(2016)].

Page 40 (of 51) www.forgetit-project.eu

Page 41: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

6 Mapping to OAIS and Preservation Management

6.1 The OAIS Reference Model

The Reference Model for an Open Archival Information System (OAIS) is a specifica-tion of the Consultative Committee for Space Data Systems (CCSDS), describing anarchive, consisting of an organization of people and systems, that has accepted theresponsibility to preserve information and make it available for a Designated Commu-nity [CCSDS(2012)]. OAIS specification includes a functional model and an informationmodel. The OAIS functional model is depicted in Figure 16 . OAIS functional entitiesinclude Ingest, Access, Data Management, Administration, Preservation Planning andArchival Storage.

Figure 16: OAIS Functional Entities [CCSDS(2012)].

A brief description of the purpose of these functional entities follows, for a full descriptionplease refer to the CCSDS Recommended Practice 650.0-M-2 [CCSDS(2012)]. Startingfrom the information producer side in Figure 16, the Producer sends in a Submission In-formation Package (SIP) to the Ingest function. This function is responsible for receivingpackages and re-packaging them into an Archival Information Package (AIP). The AIPshave to be complete in order to facilitate preservation, while the SIP can lack some infor-mation that already is in the archive, or that only the archive can add. The AIP is thensent to Archival Storage, and descriptive information is sent to Data Management. TheArchival Storage function handles permanent storage of AIPs including error checking andmedia replacement procedures. The Data Management function is responsible for main-taining both descriptive information about the archive holdings, as well as system related

c© ForgetIT Page 41 (of 51)

Page 42: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

data and other administrative data needed for managing the archive. The Access functionthen uses the Data Management function to provide query results to the Consumer, andrequest the actual AIP(s) from the Archival Storage function in order to prepare a Dissem-ination Information Package (DIP) in accordance with what the Consumer requested. Inorder to be prepared for changes in requirements from the environment, the Preserva-tion Planning function monitors the relevant user communities and technology changesand recommends different actions to be taken. The recipient of these recommendations,and main responsible function for establishing policies and standards in the archive is theAdministration function. This function also handles negotiations with information produc-ers and consumers, and configuration management of the system. In the next Section,relationships between these entities and the PoF entities will be described.

6.2 Relationship with OAIS Reference Model

In Table 3, we describe the relationship between the functional entities of the two models,the PoF Reference Model and the OAIS Reference Model. For OAIS we refer to the lastapproved OAIS specification [CCSDS(2012)].

Functional Entity OAIS FunctionalEntity

Relation

Exchange support,Contextualization,Re-contextualization

Ingest (p.4-5),Access (p.4-16)

Supports smooth bi-directional transfer ofarchival objects from Active System toDPS and vice versa. Creates submis-sion information packages according toDPS submission agreements. The pack-ages are adjusted to metadata and struc-ture of information packages requirements.The Contextualization entity adds use con-text metadata to submission packages. Atre-activation of archival objects the Re-contextualization entity adds information ifthere is a lack of sufficient context informa-tion. Exchange support is also responsibleof restructuring of archival objects receivedfrom DPS on advice from Context-awarePreservation Management.

Context-awarePreservationManagement

Administration(p.4-11)

Verifies that the content in submissions isaccording to DPS agreement. Supportsthe Audit Submission function that veri-fies that archival object packages meet thespecifications of submission agreementsas part of Administration functional entity.

Page 42 (of 51) www.forgetit-project.eu

Page 43: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

Context-awarePreservationManagement

Preservationplanning (p.4-14),Administration(p.4-11)

Activity monitoring; collects information re-lated to exchange of archival objects be-tween Active System and DPS. Informa-tion related to use of content such as fileformat, system information, physical andlogical content structure. Information sub-mitted in the archival object transferred tothe DPS intended as input to PreservationPlanning and Administration entities.

Context-awarePreservationManagement

Preservationplanning (p.4-14),Administration(p.4-11)

Statistical computation of collected infor-mation from bi-directional exchange ofarchival objects between Active Systemand DPS. Information such as use offile format(s), system information, physicaland logical content structure. The moni-tor of many-to-many relation between Ac-tive Systems and DPSs provides a feder-ated Technology watch function supportingthe Preservation planning and Administra-tion functional entities.

Context-awarePreservationManagement

Administration(p.4-11)

Supporting preservation decisions on howmuch should be invested in the preserva-tion of a resource. This is done by match-making of digital objects content value andpurpose of use to configuration of preser-vation services. Provides input to ManageSystem Configuration function in the Ad-ministration functional entity.

Search & Navigation Access (p.4-16),Federation ofarchives (p.6-4),Distributed Ac-cess Aid (p.6-5)

The PoF framework acts as a federationof DPSs in providing search capabilities foran Active System. The search service pro-vides the ability to search in a combina-tion of content from both the Active Sys-tem and DPS(s), as a joint virtual informa-tion space. The ranking of search resultsfrom the DPS(s) is based on the definitionof Content Value for each archival object.

Managed Forgetting Administration(p.4-11)

The results from content value assess-ments are input to the Administration en-tity in the DPS. This information is basis forupdate actions in the DPS.

Table 3: Mapping between PoF Reference Model and OAIS Functional Entities in the OAISSpecification (CCSDS Magenta Book) [CCSDS(2012)].

c© ForgetIT Page 43 (of 51)

Page 44: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

6.3 Relationship with PAIMAS Reference Model

The Producer-Archive Interface Methodology Abstract Standard (PAIMAS) [?] providesmethod description supporting the interaction between Producer and Archive. PAIMAS isdivided in four different phases: Preliminary Phase, Formal Definition Phase,Transfer Phase, and Validation Phase as depicted in Figure 17.

Figure 17: PAIMAS Main Phases [?].

The PoF Reference Model is responsible for defining a submission agreement containingthe digital information objects and metadata to be submitted from the information Pro-ducer to an Archive and for effectively packaging the objects in the form of SubmissionInformation Packages (SIPs). There can be several ”submission agreements” betweena Producer and an Archive, and these different agreements cover different and indepen-dent sets of information. Previous agreements can be used to guide the completion of anew submission agreement. The PoF method is influenced by the approach defined inthe recommended standard for Producer-Archive Interface Specification (PAIS) [?]. PAISis a standard method for formally defining the digital information objects to be transferredby an information Producer to an Archive and applies specifically to the implementationof the main part of the Formal Definition Phase and the Transfer Phase in PAIMAS [?].PAIS provides an abstract syntax for describing how these data will be aggregated intopackages for transmission and one concrete implementation for the packages based onthe XML Formatted Data Unit (XFDU) [?] standard.

Page 44 (of 51) www.forgetit-project.eu

Page 45: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

The Context-aware Preservation Management entity supports the Preliminary Phaseby identification of suitable DPS for preserving the significant properties of Producersinformation objects within acceptable cost limits associated with the storage needs. Thisis the basis for deciding the feasibility of a PoF solution according to the Producer archivalneeds.

In the Formal Definition Phase, the PoF components support the following activities:

• Agree upon the content and structure of the collection to be transferred from AS toPoF.

• Define expected file formats, expected instances, migration needs, and maximumvolume of each transferred collection.

• Agree on descriptive metadata and its structure.

• Define level of provenance support by PoF.

• Define if and to what extent metadata should be added by Active System and PoFcomponents.

• Adapt the SIP content, structure and use of metadata to receiving DPS.

• Define the communication operations as the use of interoperability services andsecurity conditions.

• Agree on procedures for triggering a transfer, rejection, re-transfer, and transfercollection acceptance.

• Determine storage service configuration based on identified conditions as locationrestrictions, security, performance, and cost.

The output from this phase ends in a submission agreement stated in a Manifest used astemplate for configuration of the PoF process.

6.4 Evolution and Value creation in the Preservation Store

Storage plays a key role in any preservation system. Nowadays, much of the worlds dataresides in storage clouds, both public and private. Storage clouds are highly distributedand scalable, and they offer high availability, high performance, and huge capacities atlow costs. However, storage services tend to focus on storage and data services, whilebeing agnostic to the content of the data they service. To increase the value of stored in-formation over time, there is a need for the storage services to have some understandingof the content of the data and perform automatic processes on it, some of them period-ically. For instance, in order to maintain the logical preservation of the stored data, thestorage service could periodically check whether it needs to perform transformations onthe data from obsolete formats to updated formats.

c© ForgetIT Page 45 (of 51)

Page 46: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

Data analytics is an umbrella term covering a variety of techniques and approaches in dif-ferent domains for inspecting, transforming, and modeling data in order to discover hiddenvalue, such as information which is not easily or directly accessible. Data analytics con-tinues to gain momentum and several open source platforms are available. The approachdescribed above, where storage services perform processes close to data, enables anal-ysis of big data in the cloud. Cloud analytics is offered by all big data players, includingIBM, Google and Amazon, just to name a few.

The Preservation Store in ForgetIT architecture is based on cloud, as discussed in deliv-erables D8.1 [ForgetIT(2014b)] and D7.1 [?]. The DPS is made up of a Digital Repositoryand a Cloud-based Preservation-aware Storage System, as described in Section 4. InForgetIT, a new mechanism based on storlets [?] will be investigated, to implement longterm digital preservation in the cloud within the DPS [?, ?]. Storlets can help increasingthe value we get out of storage and the speed at which we can access what we need, ina nutshell: getting more value from data, much faster. A storlet is code that runs in thestorage cloud. The main idea is to run the code close to where the data is stored, so asto minimize costs of data transfer when operating in the cloud. Storlets are typically usedto transform the data, filter the data, or analyze the data, all in the storage cloud. Storletsare an ideal way to introduce new services. Storlets could also be used for automatedcreation of data-object metadata. This metadata could be used to facilitate the retrievalof preserved information back into active use, even if a long time has passed since it wasarchived. Additional information can be found in WP7 deliverables.

Page 46 (of 51) www.forgetit-project.eu

Page 47: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

7 Summary and Future Work

In this deliverable we described the first version of the PoF Reference Model, based on theForgetIT novel approach to digital preservation inspired by human forgetting and remem-bering. The model described here aims to provide the basic terminology and conceptsinspired by core ForgetIT principles: synergetic preservation, managed forgetting andcontextualized remembering. The model also supports the implementation of a forgetfulapproach to preservation in the PoF Framework.

The foundations of the PoF Reference Model (Section 2) are identified by five charac-teristics: integrative, value-driven, brain-inspired, forgetful and evolution-aware. Eachcharacteristic has been discussed and the relationship to the relevant work package hasbeen also clarified. Following the OAIS specification approach [CCSDS(2012)], we havefirst defined a Functional Model (Section 3.1) , with several functional entities and work-flows. Some preliminary ideas for the ForgetIT Information Model (Section 3.2) are alsoincluded, although further analysis is required, as discussed in the following.

The functional part of the PoF Reference Model is made up of three layers: Core Layer,Remember & Forget Layer and Evolution Layer. For each layer we described the mainfeatures and the identified functional entities and for each entity we described the rolewithin that layer. Different workflows have been also defined, for preservation preparation,re-activation, situation change and setting change. For each workflow we described theinvolved model entities. Such workflows will be further evaluated and refined in the finalversion of the model.

The ForgetIT model does not intend to replace the OAIS model, but rather to go beyondto support the novel approach of the project. In the document we explained how thePoF Reference Model can interact with OAIS compatible approaches and then describedthe relationship between the functional entities of the two models (Section 6), the PoFReference Model and the OAIS Reference Model.

The PoF Framework integrates the results of the different WPs and the first integrated pro-totype described in deliverable D8.3 [ForgetIT(2014c)] is strongly inspired by the conceptsof the PoF Reference Model. The new releases of the framework will further implementthe model and will be used for validating it. In the document we provided a mappingbetween the functional entities and the PoF Framework architecture (Section 4), basedon the components described in D8.1 [ForgetIT(2014b)] . Moreover, for each aforemen-tioned workflow we described the role of the framework components, mainly for whatconcerns the PoF Middleware, which includes the most relevant part of the new ForgetITapproach. In this way we have shown that the ForgetIT architecture is compliant to thePoF Reference Model.

Finally, we included some preliminary considerations related to the extension of the cur-rent information management systems and to the possible impact of introducing thepreservation into the information management workflows (Section 5). This is relatedto one of the core ForgetIT principles, namely the synergetic preservation: enabling a

c© ForgetIT Page 47 (of 51)

Page 48: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

smooth transition between active use and preservation by making intelligent preservationprocesses an integral part of the content life-cycle in information management and by de-veloping solutions for smooth bi-directional transitions. We described possible extensionsof the Active System to enable interaction with the DPS, describing information exchange,evidences for information value and contextualization support. This is also related to cre-ating immediate benefit from the preservation, not only on the long run, and can drive thepreservation strategies in terms of choosing the preservation objects and managing withthe co-existence of the resources in the DPS and in the Active System.

In the following Sections we briefly discuss the assessment of the results presented hereaccording to the WP8 performance indicators and then describe the plan for future work.

7.1 Assessment of Performance Indicators

The expected WP8 outcomes, reported in the project proposal, are:

• the Preserve-or-Forget (PoF) Reference Model

• the PoF Framework

The results described here for the first framework prototype refer to outcome 1, for whichthe following success/progress indicators have been identified in the project proposal:

1. availability of reference models and degree of adoption in the different contexts cov-ered by each work package

2. availability of adequate ”formats” for content and metadata

The model described here represents the first release of the PoF Reference Model. Theresults achieved so far are compatible with the expected progress and success indicatorsfor WP8, although further investigation is requested within the project towards the finalmodel release, as described below.

Indicator 1: Reference Model

The PoF Reference Model has been created from scratch collecting the inputs from allwork packages, taking into account the recommendations after the first project review,mainly for what concerns the role of OAIS in the PoF Framework and the need to gobeyond such model to support the novel ForgetIT approach to preservation. The modelreflects the common knowledge and the new ideas of all partners to represent the For-getIT approach in a coherent framework based on a reference model. The referencemodel includes a functional model and an information model. The functional model hasbeen described in detail and can be considered almost complete. Further details couldbe still added in the future, but the three proposed layers cover all aspects of ForgetIT ap-proach to digital preservation. The information model is still under definition and requires

Page 48 (of 51) www.forgetit-project.eu

Page 49: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

further investigation within the project to propose a complete and coherent definition ofthe preservation objects, their interaction and evolution within the PoF Framework, al-though the building blocks of the information objects have already been identified andsuitable technologies have been identified for the first framework prototype in D8.3. ThePoF Framework architecture described in D8.1 is compliant to the model, although somemodifications are still possible when the information model is available.

Indicator 2: Content and Metadata Formats

The information model is still under definition. When ready, it will be also used to definerequirements on content and metadata formats required to represent the preservationobjects discussed in this document. Nevertheless, the project has already released thefirst integrated prototype, described in D8.3, and has identified several content and meta-data formats and the associated technologies to properly represent the core conceptsof the information model described here. For example the adoption of the CMIS stan-dard enables the content and metadata exchange between the user applications and theframework, while the standards used for content packaging and archival (such as METS,DublinCore, PREMIS and others) are suitable to support the representation of contents,their exchange between user applications and preservation systems, their evolution overtime (see deliverables D8.1 and D8.3). The definition of the archival units and the mostsuitable data model in the PoF Framework requires further investigation.

7.2 Future Work

For the initial version of the PoF Reference Model, we focused mainly on the functionalmodel. The definition of a ForgetIT information model is also planned. In this new in-formation model we will provide a representation of the archival entities in the context ofForgetIT, identifying the most appropriate structure and describing the interaction and evo-lution of the information objects with the PoF Framework. The information model requiresfurther investigation within the project, in parallel with the development of the referenceplatform and the availability of the prototype deliverables from technical WPs.

The final release of the PoF Reference Model is expected at M36, and will be reported indeliverable D8.5 [ForgetIT(2016)]. The second release of the PoF Framework, describedin D8.4 [ForgetIT(2015b)], will integrate further project components and will provide addi-tional features based on the model described here, mainly for what concerns the devel-opment of the PoF Middleware, the integration with the Active Systems and the DPS andthe support for all the workflows described here.

c© ForgetIT Page 49 (of 51)

Page 50: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

ForgetIT Deliverable D8.2

References

[CCSDS(2012)] CCSDS 2012 CCSDS: Reference Model for an Open Archival Infor-mation System (OAIS) - Recommended Practice, CCSDS 650.0-M-2 (Magenta Book)Issue 2. Also available as ISO Standard 14721:2012. http://public.ccsds.org/publications/archive/650x0m2.pdf. June 2012. – Retrieved 29 August2014

[ForgetIT(2014a)] ForgetIT 2014a FORGETIT: Deliverable D5.2: Workflow Model andPrototype for Transition between Active System and AIS. February 2014

[ForgetIT(2014b)] ForgetIT 2014b FORGETIT: Deliverable D8.1: Integration Plan andArchitectural Approach (v2). August 2014

[ForgetIT(2014c)] ForgetIT 2014c FORGETIT: Deliverable D8.3: The Preserve-or-Forget Framework – First Release. August 2014

[ForgetIT(2015a)] ForgetIT 2015a FORGETIT: Deliverable D2.3: Foundations of For-getting and Remembering - Intermediary Report. January 2015

[ForgetIT(2015b)] ForgetIT 2015b FORGETIT: Deliverable D8.4: Preserve-or-ForgetFramework – Second Release. April 2015

[ForgetIT(2016)] ForgetIT 2016 FORGETIT: Deliverable D8.5: The Preserve-or-ForgetReference Model – Final Model. January 2016

Page 50 (of 51) www.forgetit-project.eu

Page 51: D8.2 - PoF Reference Model...Deliverable D8.2 Work-package WP8: The Preserve-or-Forget Reference Model and Framework Deliverable D8.2: The Preserve-or-Forget Reference Model Initial

Deliverable D8.2 ForgetIT

c© ForgetIT Page 51 (of 51)