Chapter 16 - Tools for Countering the Threats to Digital Preservation

download Chapter 16 - Tools for Countering the Threats to Digital Preservation

of 20

Transcript of Chapter 16 - Tools for Countering the Threats to Digital Preservation

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    1/20

    Chapter 16

    Tools for Countering the Threats to Digital

    Preservation

    We begin with a brief recap of the points made in Chap. 5 about the broad

    threats to the preservation of our digitally encoded information. Then a numberof components, both infrastructure and domain dependent, are discussed and the

    CASPAR implementations of these are introduced. Subsequent chapters build up

    the details of the infrastructure and tools which indicate how these solutions could

    be implemented and for which strong prototypes exist at the time of writing.

    The major threats and their solutions are as follows:

    Threat Requirements for solution

    Users may be unable to understand or usethe data e.g. the semantics, format,

    processes or algorithms involved

    Ability to create and maintain adequateRepresentation Information

    Non-maintainability of essential hardware,

    software or support environment may make

    the information inaccessible

    Ability to share information about the

    availability of hardware and software and

    their replacements/substitutes

    The chain of evidence may be lost and there

    may be lack of certainty of provenance or

    authenticity

    Ability to bring together evidence from

    diverse sources about the authenticity of a

    digital object

    Access and use restrictions may make it

    difficult to reuse data, or alternatively may

    not be respected in future

    Ability to deal with digital rights correctly

    in a changing and evolving environment.

    Preservation-friendly rights or appropriate

    transfer of rights is necessary

    Loss of ability to identify the location of

    data

    An ID resolver system which is really

    persistent

    The current custodian of the data, whether

    an organisation or project, may cease to

    exist at some point in the future

    Brokering of organisations to hold data and

    the ability to package together the

    information needed to transfer information

    between organisations ready for long term

    preservation

    The ones we trust to look after the digital

    holdings may let us down

    Certification process so that one can have

    confidence about whom to trust to preserve

    data holdings over the long term

    271D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_16,C Springer-Verlag Berlin Heidelberg 2011

    http://-/?-http://-/?-
  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    2/20

    272 16 Tools for Countering the Threats to Digital Preservation

    16.1 Key Preservation Components and Infrastructure

    When thinking about what tools and services are needed to help us with preserva-

    tion, we should consider how to deal with all the types of digital objects discussed

    in Chap. 4, without having to tailor software for each. Identifying commonalitiesallows us to share the cost of such tools and services remember one of the key

    disincentives is cost.

    One way of doing this is to distinguish between those things which can be used

    for many different types of digitally encoded information and those things which

    are closely tied to the specific type of digital object. The former we shall refer to

    as domain independent while the latter we will refer to as domain dependent. We

    use the term domain (or sometimes discipline) because it is then easier to map

    to the real world, instead of having to think about different digital object types.

    Each domain will tend to use many different digital object types, and there willbe overlaps although often with a different focus, nevertheless people tend to think

    about their own domain of work rather than their digital object types.

    Since one tends to refer to tools which can be used for many different kinds

    of things as infrastructure we shall often use the term domain independent

    infrastructure or simply infrastructure.

    To make the distinction we can examine the issues from several angles, making

    use of the diagrams we have discussed earlier.

    Figure 16.1, which was briefly introduced in Sect. 6.5, and the associated table,

    pick out the main components, following the lifecycle of a piece of digitally encoded

    Fig. 16.1 CASPAR information flow architecture

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    3/20

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    4/20

    274 16 Tools for Countering the Threats to Digital Preservation

    7. Describe Simple Object

    COMPONENT: Simple Object

    Virtualiser (see Sect. 17.3)

    The simple in this case refers to the type of

    Information Object; this is a non-trivial component

    involving descriptions of the Structure as well as

    the Semantics of relatively self-contained Objects

    8. Describe Complex Object

    COMPONENT: Complex Object

    Virtualiser (see Sect. 17.3)

    A Complex Object is one that can be described in

    terms of several, possibly a large number of, both

    Simple Objects and Complex Objects and their

    inter-relations. In particular, the Complex Object

    Virtualiser has to cope with multiple partial copies

    of the datasets forming the referred Object and with

    the management of the lower-level objects in a

    fully distributed environment.

    9. Some of the Objects are created

    on-demand. It may be true to saythat most Information is created in

    this way.

    COMPONENT: On-Demand Object

    Virtualiser (see Sect. 17.3)

    An On-Demand Object is one (Simple or Complex)

    that can be referred to by the available knowledgeand can be instantiated on request.

    10. Produce OAIS Preservation

    Description Information (PDI)

    COMPONENT: Preservation

    Description Information Toolbox (see

    Sects. 17.7, 17.8, 17.11, and 17.9)

    PDI includes Fixity, Provenance, Reference and

    Context information. The PDI Toolbox has a

    number of sub-components that address each of

    these and guide the user to produce the most

    complete PDI possible. Knowledge capture

    techniques should also be applicable here.

    Having created the Virtualisation information (which includes and extends the OAIS concept

    of Representation Information), it must be stored in an accessible location.

    11. Store DRM/Virtualisation

    Information/Representation

    Information in Registry

    COMPONENT: Registry

    We take the general case that it is stored in an

    external registry in order to allow the possibility of

    enhancing the Representation Information etc. to

    cope with changes in the technologies, Designated

    Communities etc.

    The alternative of storing this metadata with the

    data object is possible but would not addresslong-term preservation because:

    the RepInfo cannot be complete;

    the RepInfo cannot easily be updated;

    the RepInfo has to be repeated for each object or

    copy of the object and consistency cannot easily

    be maintained;

    the effort of updating the RepInfo cannot easily

    be shared, in particular when the originator is no

    longer available.

    In addition to Representation Information, there may also be keys, public and private, for

    encryption etc. that need to be available over the long term.

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    5/20

    16.1 Key Preservation Components and Infrastructure 275

    12. Store Keys, public and private

    COMPONENT: Key Store

    Public keys can be stored in any convenient

    location that is accessible to users. However, for

    long-term preservation these keys must be

    guaranteed to be available, as must the

    appropriate encryption or digest algorithm.

    The same applies for private keys, which must be

    held in escrow for some agreed period, with

    adequate security.

    The collection of information adequate for preservation is a key concept in OAIS the

    Archival Information Package.

    13. Construct the AIP

    COMPONENT: Archival

    Information Packager (seeSect. 17.10)

    The AIP is a logical construct, and key to

    preservation in the OAIS Reference Model. The

    AI Packager logically binds together the

    information required to preserve the Content

    Information so that it is suitable for long-term

    preservation. However, this should not be

    regarded as a static construct, since, as has been

    stressed, preservation is a dynamic process. The

    AI Packager works with the Preservation

    Orchestration Manager

    Having the AIP, this must now be securely stored.

    14. Store the AIP data object

    securely for the long term.

    COMPONENT: Preservation Data

    Object (PDO) (see Sect. 17.6)

    Digital storage comes in many different forms,

    and the hardware and software technology isconstantly evolving. The PDO virtualises the

    storage at the level of a data object; in this way, it

    extends the current virtualised storage, which

    allows transparent access to distributed data. The

    PDO hides the details of the storage system, the

    collection management etc. all of which can

    cause a great deal of trouble when migrating, as

    hardware and software technology changes.

    One way of looking at this is to view it as an

    implementation of the OAIS Archival Storagefunctional element. As such, it allows the

    development of a market of interchangeable

    Archival Storage elements for a variety of

    archives.

    Now we come to the period when the data object is stored for many years in principle

    indefinitely.

    During this time the originators of the data pass away; hardware and software become

    obsolete and are replaced; the organisation that hosts the repository evolves, merges, perhaps

    terminates (but hands on its data holdings); the community of users, their tools, their

    underlying Knowledge Base change out of all recognition.In the background, something must keep the information alive, in the same way as the bodys

    autonomic nervous system keeps the body alive, namely by triggering breathing, heartbeat

    etc. Note that the autonomic nervous system does not actually do the breathing etc., but

    provides the trigger. This is what must be arranged.

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    6/20

    276 16 Tools for Countering the Threats to Digital Preservation

    15. Notify the repository when

    changes must be made

    COMPONENTS:

    RepInfo Gap Manager (seeSect. 17.4)

    Preservation Orchestration

    Manager (see Sect. 17.5)

    This will provide a number of notification

    services to alert repositories, which have

    registered appropriately, of the probable need to

    take action to ensure the preservation of their

    holdings. This action could range from the needfor migration to new formats to the obsolescence

    of hardware to the availability of relevant

    Representation Information. In addition,

    brokering services and workflow control

    processes will be available to assist data holders

    to access services for example, to transform

    data or to hand over holdings to longer-lived

    repositories.

    Activities include advising on preservation

    strategies, providing support for PreservationPlanning in repositories, and sharing

    Representation Information.

    Without this type of background activity,

    preservation is at risk by neglect. Clearly, larger

    organisations may not need this, but, even in the

    largest and best run organisations, individual

    preservation projects may be funded on a

    relatively short-term basis.

    This infrastructure must itself be persistent.

    information as it is ingested into an OAIS system and subsequently retrieved for use

    at some time in the future. The table mentions some components which will be

    introduced in later sections.

    16.2 Discipline Independent Aspects

    Building on the previous section we can now look as the OAIS Functional Model(Fig. 16.2) from another viewpoint, and try to make the distinction between the

    domain dependent and domain independent parts.

    The OAIS Functional Entities in the Functional Model (repeated here for conve-

    nience from Fig. 6.8) can be used to group the domain independent concepts and

    components.

    16.2.1 Preservation Planning

    16.2.1.1 Registries of Representation Information

    The Registry/Repository concept was introduced in Sect. 7.1.3 the term

    Registry/Repository is used, rather than simply Registry, in order to stress the fact

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    7/20

    16.2 Discipline Independent Aspects 277

    Fig. 16.2 OAIS functional model

    that the concept embodies the holding (in the Repository) of a significant amount of

    digital information the Representation Information rather than simply pointers

    to external resources.

    The prime functions of a Registry/Repository are:

    Given an identifier of a piece of Representation Information (RepInfo), return

    that piece of Representation Information to the requestor. This Representation

    Information will in general be an opaque binary object as far as the Registry/

    Repository is concerned.

    Allow searching of the holdings of the Registry in order to enable the re-use of

    existing RepInfo.

    To facilitate this searching, each piece of RepInfo should be classified under

    one or more Classification Schemes, and have a searchable text description of

    the RepInfo.

    Each piece of RepInfo should itself have a pointer to its own RepInfo, and also

    details of its PDI.

    The Registry/Repository should itself be an OAIS which can be certified for long-

    term preservation of information.

    The Registry/Repository functionality is domain independent because pieces

    of the Representation Information are, as far as the Registry/Repository is

    concerned, opaque binary objects.

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    8/20

    278 16 Tools for Countering the Threats to Digital Preservation

    Of course any piece of Representation Information could be domain specific,

    but that content is not relevant to the Registry/Repository. It is important to note

    that there may be multiple ways to describe something. For example Structure-type

    Representation Information may come in the form of an EAST description, or a

    DRB description or a DFDL description. All these are valid and each of these inturn will have its own Representation Information.

    In addition, it is possible that two archives may have identical copies of a piece of

    data but may provide entirely separate pieces of Representation Information. This

    is in many ways a duplication of effort. However the Registry/Repository will be

    entirely unaware of this duplication since (1) it does not have a link back to the data,

    as this would not be maintainable and (2) the pieces of Representation Information

    are opaque binary objects as far as it is concerned.

    A separate, value added, service may be developed by analysing the links

    between data and Representation Information, in a way analogous to the rankingalgorithm used by Google. Such a service would enable one to say, for example,

    that 99% of all archives use CPIDYYY as the Representation Information for a cer-

    tain type of data. Such a statistic may influence others to use that particular piece

    of Representation Information rather than some other, competing, Representation

    Information.

    New versions of Representation may be created from time to time, to improve

    usability or accuracy. The versioning must be controlled and it will prove useful

    to distinguish between a unique identifier for a particular version and a logical

    identifier for all versions of the Representation Information. Using the logical iden-tifier should return the latest (and presumably the best) version, which will change

    as new versions are created, whereas using the unique identifier, or, equivalently,

    providing a specific version number, should always provide that specific piece of

    Representation Information.

    Representation Information may be cached, that is to say copies may, for con-

    venience, be kept, in a variety of locations, including packaged with the Data

    Object. Caching is a well known optimisation technique and the appropriate steps

    must be taken to ensure that the cache copies are identical with the original,

    however the task is made easier because a particular piece of RepresentationInformation is never changed, instead, as discussed above, a new version is

    created.

    16.2.1.2 Orchestration

    The Orchestration component has to:

    allow individuals to register their interests and expertise

    collect information from (anonymous or registered) individuals about changes

    in software, hardware, environment or Knowledge Base of any Designated

    Community. This information will be passed on to the RepInfo Gap Manager

    component.

    receive information from the RepInfo Gap Manager component about a gap

    which has been identified

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    9/20

    16.2 Discipline Independent Aspects 279

    send requests to appropriate registered users, based on their interests and

    expertise, for the creation of required Representation Information

    The Orchestration functionality is domain independent in that it needs noembedded domain specific knowledge in order to match keywords specifying

    gaps to people, although clearly some domain specific thesauri could help give

    a wider set of relevant matches.

    16.2.1.3 RepInfo Gap Manager

    The RepInfo Gap Manager component embodies a small but essential application ofKnowledge Management techniques to preservation. Its main purpose is to assist in

    identifying gaps which have arisen as a result of changes in hardware, software,

    environment and Knowledge Bases of Designated Communities. This has been

    discussed extensively in Chap. 8.

    The changes are notified by human participants in the preservation process.

    The RepInfo Gap Manager knows of the existing dependencies between pieces

    of Representation Information, working closely with one or more instances of the

    Registry/Repository. The labels in the Registry/Repository capture those dependen-

    cies. The changes imply that gaps in the Representation Information network willhave arisen, which must be filled. Human participants must be alerted and requested

    to provide new Representation Information to fill those gaps. The human participa-

    tion may not always be necessary; the RepInfo Gap Manager may be able to bring in

    Representation Information from another, existing, source to fill the gap although

    this would have to be checked by humans.

    As an example of these gaps we can look at the dependencies in the

    Representation Information about a piece of astronomical data (repeated for con-

    venience from Sect. 6.3.1). FITS is a standard data format that is used in astronomy.

    To understand a FITS file one needs to understand the FITS standard which is inturn described in a PDF document. To understand the keywords contained in a FITS

    file one needs to be able to understand the FITS dictionary (that explains the usage

    of keywords). Figure 16.3 illustrates these dependencies.

    At some particular point in time the Dictionary may be part of the Knowledge

    Base of the Designated Community (i.e. astronomers). However there may come

    a time when this particular type of Dictionary begins to fall from general use.

    A gap in the Representation Information net will begin to appear, which must be

    filled. In most cases some human participant will have to create the additional piece

    of Representation Information that is required. However it may be the case that

    in some separate Representation Information Network uses the same Dictionary

    and provides Representation Information for the Dictionary. The RepInfo Gap

    Manager may be able to deduce that the latter can be re-used in the astronomical

    case.

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    10/20

    280 16 Tools for Countering the Threats to Digital Preservation

    FITS

    FILE

    FITS

    DICTIONARY

    FITS

    STANDARD

    PDF

    SOFTWARE

    JAVA VM

    PDF

    STANDARD

    FITS JAVA

    SOFTWARE

    DICTIONARY

    SPECIFICATION

    XML

    SPECIFICATION

    UNICODE

    SPECIFICATION

    DDL

    DESCRIPTION

    DDL

    DEFINITION

    DDL

    SOFTWARE

    Fig. 16.3 FITS file dependencies

    The RepInfo Gap Manager manipulates symbols and identifiers and does not

    require embedded domain specific knowledge.

    16.2.2 Digital Object Storage

    The Digital Object Storage (or sometimes simply Storage) component takes care

    of the Digital Object and encapsulates:

    The secure preservation of the bits which encode the information of interest.

    This of course applies to a primary Data Object, Representation Information,

    Preservation Description Information etc, the latter also being Data Objects.

    These individual stored objects form the simplest element in the storage sys-

    tem, and each needs only be regarded as opaque binary objects, whose internal

    structure need not be known or understood by the Storage system, although the

    structure of the AIP, e.g. how to get the PDI object out of the AIP, will be known

    to it.

    The association of Representation Information and PDI with the Content

    Information. This association may include having copies of the Representation

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    11/20

    16.2 Discipline Independent Aspects 281

    Information or PDI kept within the Storage system. However it is important to

    recognise that neither of these can be complete. For example the Representation

    Information Network will change as, for example, the Knowledge Base of

    the Designated Community changes. Similarly the Provenance information will

    include not just the technical information about copying but also but also includedescriptions of various real-world entities (e.g. persons, organisations and their

    attributes, roles and actions) whose social context is also associated with the data.

    Therefore both Representation Information and PDI will have to include a pointer

    out of the storage system.

    The automatic maintenance of the technical provenance information, including

    details of what are essentially internal events including copying, replication and

    refreshment and the objects.

    The policies which the archive imposes on the stored objects (and the

    Representation Information, PDI etc associated with the encoded instances ofthese policies), for example the number of backup copies, offsite and on-site, on-line and near-line, and

    replication

    the access controls

    the distribution of information among the individual pieces of virtualised

    storage

    maintenance of namespaces

    maintenance of collection level information

    The ability to hand on the stored AIPs, and appropriate collection information,to another OAIS system either because of technological change or because of

    organisational change as the preserved information is passed on to the next in the

    chain of preservation.

    The Digital Object Storage concept is intrinsically domain independent.

    16.2.3 Ingest

    The INGEST functional entity in the OAIS Reference Model provides

    the services and functions to accept Submission Information Packages (SIPs)

    from Producers (or from internal elements under the OAIS Administration

    control) and prepare the contents for storage and management within the

    archive. Ingest functions include receiving SIPs, performing quality assur-

    ance on SIPs, generating an Archival Information Package (AIP) which

    complies with the archives data formatting and documentation standards,extracting Descriptive Information from the AIPs for inclusion in the

    archive database, and coordinating updates to Archival Storage and Data

    Management.

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    12/20

    282 16 Tools for Countering the Threats to Digital Preservation

    The OAIS Producer-Archive Interface Methodology Abstract Standard

    (PAIMAS [22]) seeks to identify, define and provide structure for the relationships

    and interactions between an information Producer and an Archive. It defines the

    methodology for the structure of actions that are required from the initial time of

    contact between the Producer and the Archive until the objects of information arereceived and validated by the Archive. These actions cover the first stage of the

    Ingest Process. It is expected that a specific standard or community standard

    would be created in order to take into account all of the specific features of the

    community in question.

    The Producer-Archive Interface Specification [23] aims to provide a standard

    method to formally define the digital information objects to be transferred by an

    information Producer to an Archive and for effectively transferring these objects in

    the form of SIPs.

    The general concepts and checklists provided by PAIMAS and PAIS provide

    domain independent views of the processes that are needed in INGEST.

    16.2.4 Access

    ACCESS is the OAIS functional entity which provides

    the services and functions that support Consumers in determining the exis-

    tence, description, location and availability of information stored in the

    OAIS, and allowing Consumers to request and receive information prod-

    ucts. Access functions include communicating with Consumers to receive

    requests, applying controls to limit access to specially protected information,

    coordinating the execution of requests to successful completion, generat-

    ing responses (Dissemination Information Packages, result sets, reports) anddelivering the responses to Consumers.

    Looking at existing archives one sees a very great variety of ACCESS-type func-

    tions. Indeed it is probably true to say that this, the user-facing part of an archives

    work, is the area in which the archive will seek to brand its services. Clearly the

    access services have a certain degree of standardisation to allow interoperability,

    examples of which include provision of Web pages, harvesting, and FTP services.

    Nevertheless each archive will seek to provide a richer set of branded ordering,

    searching and data provision services, and thus there are limits to the type of domain

    independent services which might be offered to any archive.

    Areas in which we might hope for some discipline independence are Access

    Control and specialised Finding Aids based on PDI, and these are considered

    next.

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    13/20

    16.2 Discipline Independent Aspects 283

    16.2.4.1 Access Control/DRM/Trust

    Access Control, Trust and Digital Rights Management must attempt to withstand

    changes in:

    individuals, and their roles and even their existence organisations

    legal systems, including new rights, new types of events and new obligations

    security systems such as certificates and passwords

    A digital object may be deposited in an archive with one particular system of Access

    controls and DRM, but may (in fact certainly will) be used under a completely

    different access control system.

    While DRM systems could be made specific to domains, the requirement for

    survivability to change will tend to require a significant independence from

    domain considerations.

    16.2.4.2 Finding Aids Based on PDI

    A Finding Aid is defined in OAIS as a software program or document that allows

    Consumers to search for and identify Archival Information Packages of interest.

    If the Consumer does not know a priori what specific holdings of the

    OAIS are of interest, the Consumer will establish a Search Session with the

    OAIS. During this Search Session the Consumer will use the OAIS Finding

    Aids that operate on Descriptive Information, or in some cases on the AIPs

    themselves, to identify and investigate potential holdings of interest. This may

    be accomplished by the submission of queries and the return of result sets to

    the Consumer.

    OAIS provides terminology for the information which is used by the Finding

    Aids, for example Descriptive Information, Associated Descriptions and

    Collection Descriptions. However further specification of this information is not

    provided by OAIS, in part because of the great variety of types of information which

    could be involved.

    A type of Finding Aid which could have some discipline independent aspects

    is based on standardised PDI components, and in particular discipline indepen-dent aspects of Provenance.

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    14/20

    284 16 Tools for Countering the Threats to Digital Preservation

    16.2.5 Data Management

    The DATA MANAGEMENT functional entity in OAIS is the entity that contains

    the services and functions for populating, maintaining, and accessing a wide

    variety of information. Some examples of this information are catalogs andinventories on what may be retrieved from Archival Storage, processing algo-

    rithms that may be run on retrieved data, Consumer access statistics, Consumer

    billing, Event Based Orders, security controls, and OAIS schedules, policies, and

    procedures.

    Descriptive Information, mentioned above, is the set of information, consist-

    ing primarily of Package Descriptions, which is provided to Data Management

    to support the finding, ordering, and retrieving of OAIS information holdings by

    Consumers.

    While in general this type of information is extremely diverse, there are someinventory activities which seem particularly basic and which requires relatively

    straightforward collection of information.

    This domain independent type of Descriptive Information, used by the Data

    Management entity, is the simple catalogue of which Content Information is in

    which Archival Information Package.

    16.3 Discipline Dependence: Toolboxes/Libraries

    As noted in the description of the Registry/Repository, individual pieces of

    Representation Information are opaque binary objects to it. However the

    Representation Information must contain specific information about some specific

    data objects, and must include discipline dependence.

    The discipline specificity is captured using a variety of tools and techniques; the

    umbrella term toolbox includes all of these. Chapter 7 provides an overview of

    the types of Representation Information.Discipline specificity is also needed for parts of the Preservation Description

    Information (PDI), and an umbrella toolbox is needed here also. PDI is discussed in

    more detail in Chap. 10.

    The term toolbox should not be interpreted as a Graphical User Interface (GUI),

    rather is just an umbrella term which could include, for example, many GUIs, soft-

    ware libraries, processes and procedures. There are a number of technologies which

    appear in many different guises.

    16.4 Key Infrastructure Components

    Based on the OAIS Reference and Functional Models, CASPAR has defined the

    basic infrastructure for providing digital preservation services, called the CASPAR

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    15/20

    16.5 Information Package Management 285

    Fig. 16.4 CASPAR key components overview

    Foundation which is composed of 11 Key Components built on top of a service

    oriented Framework. And the CASPAR Framework guarantees portability and inter-

    operability (i.e. compliance to WS-I open standard) with existing systems and

    platforms.

    As shown in Fig. 16.4, the CASPAR Foundation provides a set of fully

    conformant with the OAIS Information Model by managing key concepts

    such as:

    Representation Information and Designated Community

    Preservation Description Information

    Information Packaging

    The key components identified in the CASPAR Architecture (Fig. 16.5) may be

    grouped in 6 main facade blocks:

    1. Information Package Management2. Information Access

    3. Designated Community and Knowledge Management

    4. Communication Management

    5. Security Management

    6. Provenance Management

    16.5 Information Package Management

    As shown in Fig. 16.6, the block supports Data Producers in the following main

    steps:

    1. Ingest Content Information

    2. Create Information Package, by adding also

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    16/20

    286 16 Tools for Countering the Threats to Digital Preservation

    a. Representation Information

    b. Descriptive Information

    c. Preservation Description Information3. Check Information Package

    4. Store Information Package for long term

    Fig. 16.5 CASPAR architecture layers

    Fig. 16.6 Information package management

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    17/20

    16.7 Designated Community, Knowledge and Provenance Management 287

    Those features are defined in three OAIS functional blocks: Ingest, Data

    Management and Archival Storage.

    The main component of the Information Package Management is the CASPAR

    Packaging which cooperates together with (i) Representation Information Toolkit,

    (ii) Representation Information Registry, (iii) Virtualisation, (iv) PreservationDataStores, (v) Finding Manager.

    16.6 Information Access

    As shown in Fig. 16.7, the block supports Data Consumers in the following main

    steps:

    1. Search Content Information;2. Obtain Information Package and relative Contents and Descriptions, also by

    considering the Designated Community Profile of the Consumer.

    Those features are defined in three OAIS functional blocks: Access, Data

    Management and Archival Storage.

    The main component of the Information Access is the CASPAR Finding

    Manager which cooperates together with (i) Knowledge Manager, (ii) Packaging,

    (iii) Preservation DataStores.

    Fig. 16.7 Information access

    16.7 Designated Community, Knowledge and ProvenanceManagement

    As shown in Fig. 16.8, the blocks support actors in the following main steps:

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    18/20

    288 16 Tools for Countering the Threats to Digital Preservation

    Fig. 16.8 Designated community, knowledge and provenance management

    1. Deal with Designated Community Profile and its own Knowledge Base;

    2. Identify and Provide Knowledge Gap for understanding a Content Information;

    3. Deal with Digital Rights;

    4. Guarantee Authenticity.

    Those features are defined in three OAIS functional blocks: Preservation Planning,

    Data Management and Access.

    The main components of the Designated Community, Knowledge and Prove-

    nance Management are the CASPAR Knowledge Manager and Authenticity

    Manager which cooperate together with (i) Digital Rights Manager, (ii) Preser-

    vation DataStores, (iii) Packaging.

    16.8 Communication Management

    As shown in Fig. 16.9, the block supports Data Preservers and Curators in the

    following main steps:

    1. Notify and Alert for Change Event impacting long term preservation;

    2. Trigger Preservation Process.

    Those features are defined in two OAIS functional blocks: Preservation Planning

    and Administration.

    The main component of the Communication Management is the CASPAR

    Preservation Orchestration Manager which cooperates together with (i) Knowledge

    Manager, (ii) Authenticity Manager, (iii) Representation Information Registry.

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    19/20

    16.9 Security Management 289

    Fig. 16.9 Communication management

    16.9 Security Management

    As shown in Fig. 16.10, the block supports actors in the following main steps:

    1. Deal with User Account, Role and Profile;2. Deal with Content Access Permissions;

    3. Deal with Digital Rights;

    4. Guarantee Authenticity.

    Fig. 16.10 Security management

  • 7/31/2019 Chapter 16 - Tools for Countering the Threats to Digital Preservation

    20/20

    290 16 Tools for Countering the Threats to Digital Preservation

    Those features are defined in three OAIS functional blocks: Preservation Planning,

    Data Management and Administration.

    The main component of the Security Management is the CASPAR Data Access

    Manager and Security which cooperates together with (i) Digital Rights Manager,

    (ii) Authenticity Manager.