A Logical Model for Digital Archives
description
Transcript of A Logical Model for Digital Archives
A Logical Modelfor Digital Archives
Rathachai [email protected]
Information ManagementCSIM / AIT
Issued document 1.0
2
Agenda
• 22nd Century• Digital Preservation• UCK• Introduction• Logical Model• Prototype• Related works
3
22nd Century
4
Example in 22nd Century
What is ? Error: DVDunreadable
Error: No program can open file
format .doc
!7rò??àÕ ??ߟ²ÂÚÕ??ߟ²ÂÚ
ðŽɳ!Z?g! Õr/ÕŸ/?rò?
File is read protectedPlease key password
5
Example in 22nd Century
BarackObama
44th presidentof USA
Born 08/04 /1961
When was he born?
6
Digital Preservation
7
• Digital preservation is an active management of digital information to ensure its accessibility over the time.
• Digital preservation types– Bit Preservation
Ability to produce a particular sequence of bits from storage media at any time.
– Data PreservationAbility to rendered the produced bit stream and produce a meaningful output from it at any time.
– Information PreservationAbility to understand the rendered digital object at any time
Overview
Flouris (2007)
8
OAIS
OCLC.org
pkg
pkg
pkg
Producer
Management
Ingest
Store
Query
Access
Disseminate Consumer
Workflow
Manage
9
PREMIS
PREMIS from LOC.gov
• Information providing to support preservation management– Creator, created date-time– File format– Software / Hardware environment and version– Preservation activities, involving persons, and result– Historical changes from preservation activities– Decryption code– Font, formatting, color, look & feel– Right and agreement
Overview
10
Challenge
Flouris (2007)
ConceptualLevel
PhysicalLevel
• Data Preservation• Bit Preservation
•Information Preservation
11
Underlying Community Knowledge
12
Steps towards a theory of
information preservation
Giorgos Flouris
The Theory
Flouris (2007)
13
• Designated Community (DC)– A group of people who share same knowledge
• Underlying Community Knowledge (UCK) – Language–Contextual knowledge–Background knowledge–Commonsense
Underlying Community Knowledge (UCK)
Flouris (2007)
14
Problem
Flouris (2007)
ConsumerProducer
First name = “Rathachai”Family name = “Chawuthai”
UCK 1 UCK 2
Name : “Rathachai Chawuthai”Write Read
First name = “Chawuthai”Family name = “Rathachai”
15
Approach
Flouris (2007)
ConsumerProducer
First name = “Rathachai”Family name = “Chawuthai”
UCK 1 UCK 2
Name : “Rathachai Chawuthai”Write
Delta
Read
First name = “Rathachai”Family name = “Chawuthai”
16
Motivation
Name =
First name+
Last name
Name =
Family name+
First name
?
?
UCK
AUCK
B
17
Introduction
18
Everyone is able to
understand digital information
over the time
Motivation
19
Motivation
Name =
First name+
Last name
Name =
Family name+
First name
UCK
AUCK
B
?
?Reference
20
• To develop a theory for digital archives.• To design an information model representing
contextual knowledge.• To develop a prototype system in order to
test the theory.
Objectives
21
• Do a theory by extending the existing theory of Flouris “Steps towards a theory of information preservation” (Underlying Community Knowledge)
• Design “Contextual Model” from proposed theoryBy using linked metadata to model contextual knowledge– Refers to OAIS information model– Integrates with PREMIS metadata
• Build an archival system– Refers to OAIS guideline– Supports case study of scientific research processes
Scopes
22
Logical ModelProposed Theory
23
• Theory is to– Have a reference contextual knowledge to identify
differentiates between community knowledge• A model is to
– Represent contextual knowledge – Be a reference for all underlying community knowledge – Identify differentiates between community knowledge– Capture change or evolution of the reference knowledge
itself– Be able to link concepts among designated communities by
reference contextual knowledge
Goal
24
Underlying Common Community Knowledge
A common contextual knowledge for all underlying community knowledge
UCCK
25
UCCK
• C a set of concepts. Ci C∈• R a set of Relations. Ri R ∈ and Ri → C × C• HC a set of hierarchy of Classes. HC C × C⊆
– Such as, HC(C1, C2) means that C1 is a sub concept of C2
• HR a set of hierarchy of Relations. HR R × R⊆– Such as, HR(R1, R2) means that R1 is a sub relation of R2
• IC a set of instances of C• IR a set of instances of R• A0 a set of Axiom (Inference relations of logic)
Yildiz (2006)
C R
HC
ICIR
AOHR
26
UCCK
C R
HC
ICIR
AOHR
UCCKDer
ive DeriveUCK1 UCK2
27
UCCK
UCK1 UCK2
Astrodroid#1234
Pluto
28
UCCK
UCK1 UCK2
Pluto Astrodroid#1234
29
UCCK
UCK1 UCK2
Astrodroid#1234Planet
SolarSystem Pluto
30
UCCK
UCK1 UCK2
UCCK
Pluto Astrodroid#1234
31
UCCK
UCK1 UCK2
UCCK
PlutoPluto Astrodroid#1234
32
UCCK
Past Future
UCK A UCK B
UCCK v.1
UCK C UCK D
UCCK v.2
33
The Event Ontology
Reimond (2007)
34
The Event Ontology
Reimond (2007)
Pluto Mike Brown
Astrodroid#1234
Year 2006
Changing Pluto
Prague
35
Prototype
36
As an Consumers
Archival Information
System
Consumers
AnotherArchival
Information System
AnotherArchival
Information System
Link Link
• Browse digital objects• Search relevance digital
objects across repositories• Link to other related
digital objects under contextual knowledge across systems
• Customize own designated community
37
As an Archivist
Archival Information
System
Archivist
• Ingest digital objects• Define links to other objects• Manage metadata according
to digital object’s type• Manage contextual
knowledge• Manage relationships of
documents from document process
38
As an Administrator
Archival Information
System
Administrator
• Define metadata for each type of digital object
• Define underlying common community knowledge
• Define underlying community knowledge
• Define designated communities
39
• The system should be able to:– Manage variety types of digital objects and
metadata– Establish relationship among digital objects
semantically – Have semantic search– Provide context knowledge by linked metadata of
digital objects for each designated community– Store knowledge as a graph of ontology
Requirements
40
Archival Service Provider
Digital Archive Core
Service
Semantic Search Service
Contextual Knowledge Mapping Service
UCCK Manager UCK Manager
Archival ApplicationDigital Archive User Interface AdministrationSearch
Interface
Archival DataDigital Object Metadata Knowledge Base
Consumer AdministratorArchivist
System architectures
41
• Repository system• Features
– Collect digital objects and their relations
– Collect metadata– Collect ontology– Support versioning
• Only one repository system that – Support Semantic Search– Provide Web Services
• Work as back-end services
Fedora-Commons
Duraspace.org
42
• Popular CMS• Features– Rich user management– Rich content management– Flexible for customized modules
• Only one CMS that – supports SPARQL endpoint
• Work as front-end service to end-user
Drupal
Drupal.org
43
• A Drupal’s module• Features
– Provide administration panel– Provide fast-search to Fedora database– Support many formats of metadata– Support many types of digital objects
• Only one Drupal’s module that: – Integrate with Fedora-Commons– Works with GSearch service (Semantic
Search of Fedora-Commons)• Work as front-end and administration
services
Islandora
Islandora.ca
44
Archival Service Provider
Digital Archive Core
Service
Semantic Search Service
Contextual Knowledge Mapping Service
UCCK Manager UCK Manager
Archival ApplicationDigital Archive User Interface AdministrationSearch
Interface
Archival DataDigital Object Metadata Knowledge Base
System architectures
45
Archival Service Provider
Digital Archive Core
Service
Semantic Search Service
Contextual Knowledge Mapping Service
UCCK Manager UCK Manager
Archival ApplicationDigital Archive User Interface AdministrationSearch
Interface
Archival DataDigital Object Metadata Knowledge Base
System architectures
46
Related works
47
Related Works
Semantic
Linked documentsin process
More datafor each DC
Linked knowledge Among digital archives
across DC
CASPAR SHAMAN
A Logical Model for Digital Archives
48
?
49
References
• Weisz, T., 2007. The Kaifeng Stone Inscriptions Revisited• Rhys-Lewis, J., 2000. Conservation and Preservation Activities in Archives and Libraries
in Developing Countries: An Advisory Guideline on Policy and Planning• Palfrey, J., Gasser U., Born Digital: Understanding the First Generation of Digital
Natives• Yuan, L., Banach, M., 2011. Institutional Repositories and Digital Preservation:
Assessing Current Practices at Research Libraries• Flouris, G., Meghini, C., 2007. Some preliminary ideas towards a theory of digital
preservation• CASPAR, 2005. Cultural, artistic and scientific knowledge for preservation, access an
retrieval. eu funded project (fp6-2005-ist-033572). http://www.casparpreserves.eu • SHAMAN, 2008. Sustaining Heritage Access through Multivalent Archiving. Eu funded
project (fp7-ict-216736). http://shaman-ip.eu/• Lagoze, C., Payette, S. Shin, E., Wilper, C., 2006. Fedora: an architecture for complex
objects and their relationships
50
References
• Albani, S., 2010. The ESA Approach to Long-Term Data Preservation using CASPAR• Borbinha, J., 2010. SHAMAN: Sustaining Heritage Access through Multivalent Archiving• CCSDS, 2003. 650.0-B-1 Reference Model for an Open Archival Information System
(OAIS). (ISO 14721:2003) http://public.ccsds.org/publications/archive/650x0b1.pdf • PREMIS Working Group, 2004. PREservation Metadata: Implementation Strategies.
http://www.loc.gov/standards/premis/ • Berners-Lee, T., 2001. The Semantic Web• W3C, 2004. RDF/XML Syntax Specification, http://www.w3.org/TR/rdf-syntax-grammar/• Yildiz, B., 2006. Ontology Evolution and Versioning: The state of the art.• Gustman, S., Soergel, D., Oard, D., Byrne, W., Picheny, M., Ramabhadran, B., and
Greenberg, D., 2002. Supporting access to large digital oral history archives• Hayes, P., Eskridge, C, T., Saavedra, R., Reichherzer, T., Mehrotra, M., Bobrovnikoff, D.,
2005. Collaborative knowledge capture in ontologies• Reimond, Y., Abdallah, S., 2007. The Event Ontology
51
BACK UP
52
• Preservation policy– To use well-known file format,
such as, .pdf, .xml, .tiff, .jpg, .avi, and etc• Preservation strategies
– Secure storage system, Software migration, Emulation, Media refreshment, and Disaster planning.
• Content policy– Track user activities, such as, ingest, migration, and etc.– Peer review be for deposit into repository
• Right and agreement– Because some preservation activities need to duplicate and modify
digital content, it needs to record right and agreement to digital object.
Recommendation
Yuan Li (2011)
53
OAIS
OCLC.org
ContentInformation
PDIPreservationDescriptionInformation
Archive Packaging Information
DescriptiveInformation
about Package 1
Package 1
Information Model
54
• Provenance– Describe history of creation, ownership, access, and change
• Authenticity– Ensure trustworthiness (Does digital resource render originally?)
• Preservation activities– Record process supporting preservation, such as migration
• Technical environment– Provide name and version of hardware, platform, OS, and software that is
required to render digital resources• Rights management
– Inform concern of intellectual property rights and agreement that need to be observed when execute preservation process.E.g. does a creator allow to copy his/her work or not?
Preservation Metadata
OCLC.org, usenix.org
Basic features
55
PREMIS
PREMIS from LOC.gov
Entities
56
PREMIS
PREMIS from LOC.gov
• Information providing to support preservation management– Technical information (Characteristics)
• E.g. creator, created date-time, file format, software/hardware environment, …
– Information about action of a digital object• E.g. ingest, migrate, verify, …
– Inhibitors• Password, encryption, … in order to access digital objects
– Digital Provenance• Record change of object format e.g. .DOC .PDF• Contain application, version, environment, … in order to render digital objects
– Significant Properties (If important)• Object’s characteristics e.g. font, formatting, color, …., etc• Look and feel
– Rights• E.g. Rights and agreement metadata associated with preservation
Overview
57
Information Preservation Structure
Flouris (2007)
IPS
UCK
L
Language
T
Knowledge
Digital Object
Q
Property
ans
Value
58
• Our valued digital information in the present may not be accessible or rendered originally in next 100 years.– Technological Obsolescence– Deterioration of digital storage media
• A reader in next 100 years may not understand our today digital information as same as author’s purpose.– Author and reader do not have same context knowledge– Changing of contextual knowledge over the time
• It could have the reference contextual knowledge somewhere that every local knowledge refer to.
Motivation
Yuan Li (2011), Flouris (2007)
59
• Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval – Is an Integrated Project co-financed by the European Union within the
Sixth Framework Programme– Add context knowledge to digital object following its characteristics and
representations• Similarity
– Integrate context knowledge of digital objects and estimate gap of designated communities’ knowledge with semantic technology
• Advantage of my project– Linking archive across designated communities referring to underlying
common community knowledge– Emphasize changing common community knowledge over the time
CASPAR
Casparpreserves.eu
60
• Sustaining Heritage Access through Multivalent Archiving– Is an Integrated Project co-financed by the European Union within the
Seventh Framework Programme– Represent context as relations between digital objects– Integrate context information by processes, such as, ingested, accessed,
and reused with ontological representation• Similarity
– Represent context information by linking digital objects and other things semantically based on document process
• Advantage of my project– Linking to other digital objects and other things semantically referring to
underlying common community knowledge capturing knowledge from real-world concept (rather than document processes)
SHAMAN
Sharman-ip.eu
61