A Logical Model for Digital Archives

61
A Logical Model for Digital Archives Rathachai Chawuthai [email protected] Information Management CSIM / AIT Issued document 1.0

description

Issued document 1.0. A Logical Model for Digital Archives. Rathachai Chawuthai [email protected] . Information Management CSIM / AIT. Agenda. 22 nd Century Digital Preservation UCK Introduction Logical Model Prototype Related works. 22 nd Century. - PowerPoint PPT Presentation

Transcript of A Logical Model for Digital Archives

Page 1: A Logical Model for Digital Archives

A Logical Modelfor Digital Archives

Rathachai [email protected]

Information ManagementCSIM / AIT

Issued document 1.0

Page 2: A Logical Model for Digital Archives

2

Agenda

• 22nd Century• Digital Preservation• UCK• Introduction• Logical Model• Prototype• Related works

Page 3: A Logical Model for Digital Archives

3

22nd Century

Page 4: A Logical Model for Digital Archives

4

Example in 22nd Century

What is ? Error: DVDunreadable

Error: No program can open file

format .doc

!7rò??àÕ ??ߟ²ÂÚÕ??ߟ²ÂÚ

ðŽɳ!Z?g! Õr/ÕŸ/?rò?

File is read protectedPlease key password

Page 5: A Logical Model for Digital Archives

5

Example in 22nd Century

BarackObama

44th presidentof USA

Born 08/04 /1961

When was he born?

Page 6: A Logical Model for Digital Archives

6

Digital Preservation

Page 7: A Logical Model for Digital Archives

7

• Digital preservation is an active management of digital information to ensure its accessibility over the time.

• Digital preservation types– Bit Preservation

Ability to produce a particular sequence of bits from storage media at any time.

– Data PreservationAbility to rendered the produced bit stream and produce a meaningful output from it at any time.

– Information PreservationAbility to understand the rendered digital object at any time

Overview

Flouris (2007)

Page 8: A Logical Model for Digital Archives

8

OAIS

OCLC.org

pkg

pkg

pkg

Producer

Management

Ingest

Store

Query

Access

Disseminate Consumer

Workflow

Manage

Page 9: A Logical Model for Digital Archives

9

PREMIS

PREMIS from LOC.gov

• Information providing to support preservation management– Creator, created date-time– File format– Software / Hardware environment and version– Preservation activities, involving persons, and result– Historical changes from preservation activities– Decryption code– Font, formatting, color, look & feel– Right and agreement

Overview

Page 10: A Logical Model for Digital Archives

10

Challenge

Flouris (2007)

ConceptualLevel

PhysicalLevel

• Data Preservation• Bit Preservation

•Information Preservation

Page 11: A Logical Model for Digital Archives

11

Underlying Community Knowledge

Page 12: A Logical Model for Digital Archives

12

Steps towards a theory of

information preservation

Giorgos Flouris

The Theory

Flouris (2007)

Page 13: A Logical Model for Digital Archives

13

• Designated Community (DC)– A group of people who share same knowledge

• Underlying Community Knowledge (UCK) – Language–Contextual knowledge–Background knowledge–Commonsense

Underlying Community Knowledge (UCK)

Flouris (2007)

Page 14: A Logical Model for Digital Archives

14

Problem

Flouris (2007)

ConsumerProducer

First name = “Rathachai”Family name = “Chawuthai”

UCK 1 UCK 2

Name : “Rathachai Chawuthai”Write Read

First name = “Chawuthai”Family name = “Rathachai”

Page 15: A Logical Model for Digital Archives

15

Approach

Flouris (2007)

ConsumerProducer

First name = “Rathachai”Family name = “Chawuthai”

UCK 1 UCK 2

Name : “Rathachai Chawuthai”Write

Delta

Read

First name = “Rathachai”Family name = “Chawuthai”

Page 16: A Logical Model for Digital Archives

16

Motivation

Name =

First name+

Last name

Name =

Family name+

First name

?

?

UCK

AUCK

B

Page 17: A Logical Model for Digital Archives

17

Introduction

Page 18: A Logical Model for Digital Archives

18

Everyone is able to

understand digital information

over the time

Motivation

Page 19: A Logical Model for Digital Archives

19

Motivation

Name =

First name+

Last name

Name =

Family name+

First name

UCK

AUCK

B

?

?Reference

Page 20: A Logical Model for Digital Archives

20

• To develop a theory for digital archives.• To design an information model representing

contextual knowledge.• To develop a prototype system in order to

test the theory.

Objectives

Page 21: A Logical Model for Digital Archives

21

• Do a theory by extending the existing theory of Flouris “Steps towards a theory of information preservation” (Underlying Community Knowledge)

• Design “Contextual Model” from proposed theoryBy using linked metadata to model contextual knowledge– Refers to OAIS information model– Integrates with PREMIS metadata

• Build an archival system– Refers to OAIS guideline– Supports case study of scientific research processes

Scopes

Page 22: A Logical Model for Digital Archives

22

Logical ModelProposed Theory

Page 23: A Logical Model for Digital Archives

23

• Theory is to– Have a reference contextual knowledge to identify

differentiates between community knowledge• A model is to

– Represent contextual knowledge – Be a reference for all underlying community knowledge – Identify differentiates between community knowledge– Capture change or evolution of the reference knowledge

itself– Be able to link concepts among designated communities by

reference contextual knowledge

Goal

Page 24: A Logical Model for Digital Archives

24

Underlying Common Community Knowledge

A common contextual knowledge for all underlying community knowledge

UCCK

Page 25: A Logical Model for Digital Archives

25

UCCK

• C a set of concepts. Ci C∈• R a set of Relations. Ri R ∈ and Ri → C × C• HC a set of hierarchy of Classes. HC C × C⊆

– Such as, HC(C1, C2) means that C1 is a sub concept of C2

• HR a set of hierarchy of Relations. HR R × R⊆– Such as, HR(R1, R2) means that R1 is a sub relation of R2

• IC a set of instances of C• IR a set of instances of R• A0 a set of Axiom (Inference relations of logic)

Yildiz (2006)

C R

HC

ICIR

AOHR

Page 26: A Logical Model for Digital Archives

26

UCCK

C R

HC

ICIR

AOHR

UCCKDer

ive DeriveUCK1 UCK2

Page 27: A Logical Model for Digital Archives

27

UCCK

UCK1 UCK2

Astrodroid#1234

Pluto

Page 28: A Logical Model for Digital Archives

28

UCCK

UCK1 UCK2

Pluto Astrodroid#1234

Page 29: A Logical Model for Digital Archives

29

UCCK

UCK1 UCK2

Astrodroid#1234Planet

SolarSystem Pluto

Page 30: A Logical Model for Digital Archives

30

UCCK

UCK1 UCK2

UCCK

Pluto Astrodroid#1234

Page 31: A Logical Model for Digital Archives

31

UCCK

UCK1 UCK2

UCCK

PlutoPluto Astrodroid#1234

Page 32: A Logical Model for Digital Archives

32

UCCK

Past Future

UCK A UCK B

UCCK v.1

UCK C UCK D

UCCK v.2

Page 33: A Logical Model for Digital Archives

33

The Event Ontology

Reimond (2007)

Page 34: A Logical Model for Digital Archives

34

The Event Ontology

Reimond (2007)

Pluto Mike Brown

Astrodroid#1234

Year 2006

Changing Pluto

Prague

Page 35: A Logical Model for Digital Archives

35

Prototype

Page 36: A Logical Model for Digital Archives

36

As an Consumers

Archival Information

System

Consumers

AnotherArchival

Information System

AnotherArchival

Information System

Link Link

• Browse digital objects• Search relevance digital

objects across repositories• Link to other related

digital objects under contextual knowledge across systems

• Customize own designated community

Page 37: A Logical Model for Digital Archives

37

As an Archivist

Archival Information

System

Archivist

• Ingest digital objects• Define links to other objects• Manage metadata according

to digital object’s type• Manage contextual

knowledge• Manage relationships of

documents from document process

Page 38: A Logical Model for Digital Archives

38

As an Administrator

Archival Information

System

Administrator

• Define metadata for each type of digital object

• Define underlying common community knowledge

• Define underlying community knowledge

• Define designated communities

Page 39: A Logical Model for Digital Archives

39

• The system should be able to:– Manage variety types of digital objects and

metadata– Establish relationship among digital objects

semantically – Have semantic search– Provide context knowledge by linked metadata of

digital objects for each designated community– Store knowledge as a graph of ontology

Requirements

Page 40: A Logical Model for Digital Archives

40

Archival Service Provider

Digital Archive Core

Service

Semantic Search Service

Contextual Knowledge Mapping Service

UCCK Manager UCK Manager

Archival ApplicationDigital Archive User Interface AdministrationSearch

Interface

Archival DataDigital Object Metadata Knowledge Base

Consumer AdministratorArchivist

System architectures

Page 41: A Logical Model for Digital Archives

41

• Repository system• Features

– Collect digital objects and their relations

– Collect metadata– Collect ontology– Support versioning

• Only one repository system that – Support Semantic Search– Provide Web Services

• Work as back-end services

Fedora-Commons

Duraspace.org

Page 42: A Logical Model for Digital Archives

42

• Popular CMS• Features– Rich user management– Rich content management– Flexible for customized modules

• Only one CMS that – supports SPARQL endpoint

• Work as front-end service to end-user

Drupal

Drupal.org

Page 43: A Logical Model for Digital Archives

43

• A Drupal’s module• Features

– Provide administration panel– Provide fast-search to Fedora database– Support many formats of metadata– Support many types of digital objects

• Only one Drupal’s module that: – Integrate with Fedora-Commons– Works with GSearch service (Semantic

Search of Fedora-Commons)• Work as front-end and administration

services

Islandora

Islandora.ca

Page 44: A Logical Model for Digital Archives

44

Archival Service Provider

Digital Archive Core

Service

Semantic Search Service

Contextual Knowledge Mapping Service

UCCK Manager UCK Manager

Archival ApplicationDigital Archive User Interface AdministrationSearch

Interface

Archival DataDigital Object Metadata Knowledge Base

System architectures

Page 45: A Logical Model for Digital Archives

45

Archival Service Provider

Digital Archive Core

Service

Semantic Search Service

Contextual Knowledge Mapping Service

UCCK Manager UCK Manager

Archival ApplicationDigital Archive User Interface AdministrationSearch

Interface

Archival DataDigital Object Metadata Knowledge Base

System architectures

Page 46: A Logical Model for Digital Archives

46

Related works

Page 47: A Logical Model for Digital Archives

47

Related Works

Semantic

Linked documentsin process

More datafor each DC

Linked knowledge Among digital archives

across DC

CASPAR SHAMAN

A Logical Model for Digital Archives

Page 48: A Logical Model for Digital Archives

48

?

Page 49: A Logical Model for Digital Archives

49

References

• Weisz, T., 2007. The Kaifeng Stone Inscriptions Revisited• Rhys-Lewis, J., 2000. Conservation and Preservation Activities in Archives and Libraries

in Developing Countries: An Advisory Guideline on Policy and Planning• Palfrey, J., Gasser U., Born Digital: Understanding the First Generation of Digital

Natives• Yuan, L., Banach, M., 2011. Institutional Repositories and Digital Preservation:

Assessing Current Practices at Research Libraries• Flouris, G., Meghini, C., 2007. Some preliminary ideas towards a theory of digital

preservation• CASPAR, 2005. Cultural, artistic and scientific knowledge for preservation, access an

retrieval. eu funded project (fp6-2005-ist-033572). http://www.casparpreserves.eu • SHAMAN, 2008. Sustaining Heritage Access through Multivalent Archiving. Eu funded

project (fp7-ict-216736). http://shaman-ip.eu/• Lagoze, C., Payette, S. Shin, E., Wilper, C., 2006. Fedora: an architecture for complex

objects and their relationships

Page 50: A Logical Model for Digital Archives

50

References

• Albani, S., 2010. The ESA Approach to Long-Term Data Preservation using CASPAR• Borbinha, J., 2010. SHAMAN: Sustaining Heritage Access through Multivalent Archiving• CCSDS, 2003. 650.0-B-1 Reference Model for an Open Archival Information System

(OAIS). (ISO 14721:2003) http://public.ccsds.org/publications/archive/650x0b1.pdf • PREMIS Working Group, 2004. PREservation Metadata: Implementation Strategies.

http://www.loc.gov/standards/premis/ • Berners-Lee, T., 2001. The Semantic Web• W3C, 2004. RDF/XML Syntax Specification, http://www.w3.org/TR/rdf-syntax-grammar/• Yildiz, B., 2006. Ontology Evolution and Versioning: The state of the art.• Gustman, S., Soergel, D., Oard, D., Byrne, W., Picheny, M., Ramabhadran, B., and

Greenberg, D., 2002. Supporting access to large digital oral history archives• Hayes, P., Eskridge, C, T., Saavedra, R., Reichherzer, T., Mehrotra, M., Bobrovnikoff, D.,

2005. Collaborative knowledge capture in ontologies• Reimond, Y., Abdallah, S., 2007. The Event Ontology

Page 51: A Logical Model for Digital Archives

51

BACK UP

Page 52: A Logical Model for Digital Archives

52

• Preservation policy– To use well-known file format,

such as, .pdf, .xml, .tiff, .jpg, .avi, and etc• Preservation strategies

– Secure storage system, Software migration, Emulation, Media refreshment, and Disaster planning.

• Content policy– Track user activities, such as, ingest, migration, and etc.– Peer review be for deposit into repository

• Right and agreement– Because some preservation activities need to duplicate and modify

digital content, it needs to record right and agreement to digital object.

Recommendation

Yuan Li (2011)

Page 53: A Logical Model for Digital Archives

53

OAIS

OCLC.org

ContentInformation

PDIPreservationDescriptionInformation

Archive Packaging Information

DescriptiveInformation

about Package 1

Package 1

Information Model

Page 54: A Logical Model for Digital Archives

54

• Provenance– Describe history of creation, ownership, access, and change

• Authenticity– Ensure trustworthiness (Does digital resource render originally?)

• Preservation activities– Record process supporting preservation, such as migration

• Technical environment– Provide name and version of hardware, platform, OS, and software that is

required to render digital resources• Rights management

– Inform concern of intellectual property rights and agreement that need to be observed when execute preservation process.E.g. does a creator allow to copy his/her work or not?

Preservation Metadata

OCLC.org, usenix.org

Basic features

Page 55: A Logical Model for Digital Archives

55

PREMIS

PREMIS from LOC.gov

Entities

Page 56: A Logical Model for Digital Archives

56

PREMIS

PREMIS from LOC.gov

• Information providing to support preservation management– Technical information (Characteristics)

• E.g. creator, created date-time, file format, software/hardware environment, …

– Information about action of a digital object• E.g. ingest, migrate, verify, …

– Inhibitors• Password, encryption, … in order to access digital objects

– Digital Provenance• Record change of object format e.g. .DOC .PDF• Contain application, version, environment, … in order to render digital objects

– Significant Properties (If important)• Object’s characteristics e.g. font, formatting, color, …., etc• Look and feel

– Rights• E.g. Rights and agreement metadata associated with preservation

Overview

Page 57: A Logical Model for Digital Archives

57

Information Preservation Structure

Flouris (2007)

IPS

UCK

L

Language

T

Knowledge

Digital Object

Q

Property

ans

Value

Page 58: A Logical Model for Digital Archives

58

• Our valued digital information in the present may not be accessible or rendered originally in next 100 years.– Technological Obsolescence– Deterioration of digital storage media

• A reader in next 100 years may not understand our today digital information as same as author’s purpose.– Author and reader do not have same context knowledge– Changing of contextual knowledge over the time

• It could have the reference contextual knowledge somewhere that every local knowledge refer to.

Motivation

Yuan Li (2011), Flouris (2007)

Page 59: A Logical Model for Digital Archives

59

• Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval – Is an Integrated Project co-financed by the European Union within the

Sixth Framework Programme– Add context knowledge to digital object following its characteristics and

representations• Similarity

– Integrate context knowledge of digital objects and estimate gap of designated communities’ knowledge with semantic technology

• Advantage of my project– Linking archive across designated communities referring to underlying

common community knowledge– Emphasize changing common community knowledge over the time

CASPAR

Casparpreserves.eu

Page 60: A Logical Model for Digital Archives

60

• Sustaining Heritage Access through Multivalent Archiving– Is an Integrated Project co-financed by the European Union within the

Seventh Framework Programme– Represent context as relations between digital objects– Integrate context information by processes, such as, ingested, accessed,

and reused with ontological representation• Similarity

– Represent context information by linking digital objects and other things semantically based on document process

• Advantage of my project– Linking to other digital objects and other things semantically referring to

underlying common community knowledge capturing knowledge from real-world concept (rather than document processes)

SHAMAN

Sharman-ip.eu

Page 61: A Logical Model for Digital Archives

61