Semantic Digital Preservation Rathachai Chawuthai [email protected] Information...

92
Semantic Digital Preservation Rathachai Chawuthai [email protected] Information Management CSIM / AIT Introduction Issued document 1.0

Transcript of Semantic Digital Preservation Rathachai Chawuthai [email protected] Information...

Page 1: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

Semantic Digital Preservation

Rathachai [email protected]

Information ManagementCSIM / AIT

Introduction

Issued document 1.0

Page 2: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

2

Agenda

• 22nd Century• Digital Preservation• Needs of Archive in IR• Knowledge Preservation• Technology Review

Page 3: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

3

22nd Century

Page 4: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

4

Scenario

Assume that incoming scenario is happening in

22nd century

Page 5: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

5

Present

Imagine that how a man in the future is able to read

your today digital document.

Alice BobReader Archivist

Page 6: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

6

22nd Century

Hi Bob, do you have information about

USA president “Barack Obama”

Oh! It is hard to find out.Because the information is older than 100 years.

Page 7: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

7

22nd Century

What is a DVD?

Hi Alice. Luckily, I found a DVD containing

his information

?

Page 8: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

8

Present

Do you believe that you current media will

be useful in the future?

Page 9: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

9

22nd Century

No !!! That thing is unreadable

!Error: DVDunreadable

Don’t be silly, Alice. It was popular in 100 years ago.

It can be read by DVD reader.See it !!

Page 10: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

10

Present

An age of digital media is quite short. Do you have

plant to move your data to a freshly new media?

Page 11: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

11

22nd Century

Hey, … How to open PDF file?

!

Fortunately, I can get that file.Can you open “obama2009.pdf”

Error: No program can open file format PDF

Page 12: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

12

Present

Do you inform them about software, hardware, and

version to open your file?

Page 13: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

13

22nd Century

How I know the password?

As I see, it need Adobe Reader 9.0 to open it.

File is read protectedPlease key password

Page 14: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

14

Present

Your file might be secured.Do you inform them how

to access your file?

Page 15: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

15

22nd Century

!7rò??àÕ??ߟ²ÂÚ

Õ??ߟ²ÂÚðŽɳ

!Z?g! Õr/ÕŸ/?rò?

Why the author documented in alien language?

? !

Page 16: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

16

Present

It still has issues about encoding; such as, ASCII, ANSI, ISO-8859, UTF7, big-endian,little-endian,

and font; such as, Tahoma, Venada.

How do you tell them what it require to render?

Page 17: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

17

22nd Century

BarackObama

44th presidentof USA

Born 08/04 /1961

Confuse!!! When he was born?

4th August or 8th April ?

No idea !!!!You need to ask the

author living 100 years ago.

Page 18: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

18

Present

Knowledge of today creator and future reader might be

different.How to ensure that reader

understand it correctly?

Page 19: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

19

22nd Century

What should I do if I need to find more information

relevance to Barack Obama’s family?

You may have to browse every file

from here.Good luck …

Page 20: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

20

Present

Many of files have relationship to other files.How to let them know?

Page 21: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

21

22nd Century

It would be good if an older generation has a good plan for

digital preservation

Page 22: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

22

Digital Preservation

Page 23: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

Age of Information

• Printed Age– Paper is durable format– Store under proper condition

• Digital Age– Information is fragile• Technological obsolescence• Deterioration of media

Page 24: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

24

Preservation Object

• A digital object that copy from a printed document.

• Store in common format format such as TIFF

Digitized Object

Page 25: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

25

Preservation Object

Born-Digital Object

• A digital object that create from software

• It needs to keep versioning rather than finalized document

Page 26: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

26

To be digital

Capacities v.s. Age

1000 Years

15 Years

A digital media can contain much more information than printed paper at the same volume. But the digital media’s life is shorter than printed paper.Fortunately, content in digital media is duplicated to another one easily.

Page 27: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

27

• An active management of digital information to ensure its – Maintainability

Bitstream is still be existing originally

– AccessibilityBitstream forming a file is able to be opened

– RenderabilityAn opened file presents a digital object originally

– UnderstandabilityA reader understand a digital object originally

over the time

Objective

Digital Preservation

wikipedia.org

Page 28: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

28

Maintainability

Do you have these?

How to preserve bitstream whether life of digital media is short and itself becomes old fashion?

Issue

Page 29: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

29

Maintainability

Current solution is migration.To migrate bitstream by duplicating itself from one media to anther media every interval time.

Propose Solution

Challenge• How to notify that it is time to migrate?• Do anyone have Right that intellectual property owner

allow to copy the work?• How to guarantee that nothing is lost during the

migration process?• How to keep change of the migration process?

Page 30: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

30

Accessibility

A bitstream need to be represent as a file in order to be opened by software.

Issue• In order to form an accessible file, it need to construct bitsream

to be object structure that make software understand.- Datatype: number, string, array, ….- Format: text, image, video, audio, …

• To open file, it requires environment including hardware, software, and version.

• Furthermore, some of files cannot be accessible because issue about protection from security concern

Page 31: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

31

Accessibility

Propose Solution

• Use metadata to record information that anyone need to know in order to access the file, such as– Byte encoding– File format– Hardware & Software, and their version– Password to open file

• Provide the way to access file– Use virtual environment to access file– Migrate file according to newer software

Page 32: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

32

Accessibility

Challenge

• How to make a common metadata structure?– Which information that every organization agree to include.

• How to notify that it is time to migrate to a new software?

• Do anyone have Right that intellectual property owner allow to copy and modify the work in order to support a newer software?

• How to guarantee that nothing is lost during the migration process?

• How to keep change of the migration process?

Page 33: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

33

Renderability

Although digital object is able to opened, how to guarantee that it is rendered originally or not?

Issue

Page 34: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

34

Renderability

Purpose Solution• Use metadata to record information about

look and feel of digital object, such as, – Character Code– Font– Color template

Challenge

• Which information is necessary to include in metadata?• Does it has process to verify the correctness of rendered

object?

Page 35: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

35

Understandability

Issue• How to ensure that our today digital

object characteristics including:– Documentation style

• Date format• Number format• Grammar, Sentence, Phrase, Vocabulary, Symbol

– Contemporary knowledge• Commonsense• Contextual knowledge• Knowledge automatically understanding in

community

are understanding by future readers who have difference knowledge?

Page 36: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

36

Understandability

Purpose Solution• Preserve underlying community knowledge as

well as digital preservation• Link relevance digital objects and its contents

to explore original knowledge and new knowledge– Using semantic technology

Page 37: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

37

Understandability

Challenge• How to model and implement theory of

underlying community knowledge?• How to collect context knowledge for each

period?• How to claim correctness of knowledge?

Page 38: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

38

Archive Information System

To accomplish the preservation requirements, an archive information system seems answer the solution. Thus, a good system should supports:

– Flexible information model– Long-term storage– Well-formed metadata– Preservation activities– Browsing and searching– Knowledge exploration– Preservation policy– Access policy– Right and agreement policy

Page 39: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

39

Stakeholder

To complete full features of system, it needs to support following roles:

• Provider– One who ingest digital objects to archive

• Consumer– One who retrieve preservation information.

• Management– One who provide preservation strategies and do

preservation activities such as migration

A good system should support each of uses of these roles as well

Page 40: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

40

Summary

• The goal of preservation is to maintain knowledge over the time.

• To do preservation, it needs well established metadata and system.

• A preservation system should serve functionalities to provider, consumer, and management

Page 41: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

Institutional Repositories and Digital

Preservation: Assessing Current Practices at Research Libraries

Yuan LiSyracuse University

[email protected]

Meghan BanachUniversity of Massachusetts Amherst

[email protected]

Need of Archive in IR

Page 42: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

• Archive– Is a collection of historical records, or the physical place they

are located. – contain primary source documents that have accumulated

over the course of an individual or organization's lifetime, and are kept to show the function of an organization.

• Digital Archive– Is a digital format of archive that need to do digital

preservation• Digital Media• Environment to render

Digital Archive

wikipedia.org

Page 43: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

Institutional Repository

• An Institutional Repository is an online locus for collecting, preserving, and disseminating - in digital form - the intellectual output of an institution, particularly a research institution.

• For a university, this would include materials such as research journal articles, before (preprints) and after (postprints) undergoing peer review, and digital versions of theses and dissertations, but it might also include other digital assets generated by normal academic life, such as administrative documents, course notes, or learning objects.

• The four main objectives for having an institutional repository are:– to provide open access to institutional research output by self-archiving it;– to create global visibility for an institution's scholarly research;– to collect content in a single location;– to store and preserve other institutional digital assets, including unpublished or

otherwise easily lost ("grey") literature (e.g., theses or technical reports).wikipedia.org

Page 44: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

44

Introduction

• Review– Be archive with in IRs– Manage digital content– Produce copies being digital

Page 45: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

45

Introduction

• Preservation system requires– Natural and juridical people– Institutions– Applications– Infrastructure– Procedure

Page 46: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

46

Introduction

• Issues of Preservation– Little control over ingestion process– Less-optimal formats– Poor metadata– Insufficient intellectual property rights clearance– Difficult or costly to preserve

Page 47: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

47

Objective

• Analyze needs of digital preservation (digital archive) in domain of intuitional repository

Page 48: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

48

• Is preservation part of the mission and goal of IRs?• What preservation policies exists for IRs?• What preservation strategies are IRs currently

implementing?• Are the necessary rights and agreements in place

to preserve the content of IRs?• Are all of the materials in IRs of sufficient quality

and importance to warrant long-term preservation (Content policies)?

• Do IRs currently have the necessary sustainability in terms of funding and staffing to carry out long-term preservation of their contents?

Question?

Page 49: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

49

Is preservation part of the mission and goal of IRs?

Question?

Page 50: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

50

97.4%

2.6%

Answers

NO

YES

Is preservation part of IRs?

Page 51: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

51

What preservation policies exists for IRs?

Question?

Page 52: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

52

• Duration– Short | Medium | Long

• Recommended file formats– Text formats : pdf, txt, rtf, xml, odb, ods,

odp– Image file formats : tiff, jp2, jpg– Audio formats : aif, aiff, wav– Video formats: avi, mj2, mjp2

Answers

Preservation Policies

Page 53: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

53

What preservation strategies are IRs currently implementing?

Question?

Page 54: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

54

Answers

Preservation StrategiesBackup System

Security Storage System

Checksum

Page 55: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

55

Answers

Preservation Strategies

By IR system

By external system

Preservation metadata

Page 56: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

56

• Metadata varies based on the sophistication of the collection

• Working on standard and best practices address all type of metadata

Answers

Preservation Strategies

Page 57: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

57

Are the necessary rights and agreements in place to preserve the content of IRs?

Question?

Page 58: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

58

Answers

Rights and Agreements

• Digital content may be changed if technology change

• Does this impact copyright?• Players– Content contributor– Copyright holder

Page 59: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

59

Answers

Rights and Agreements

• What is Agreement?– Click through– Written– Policies– MOUs– Verbal

Most AgreementContributor needs permission to submit work that is own by

another party

Page 60: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

60

Are all of the materials in IRs of sufficient quality and importance to warrant long-term preservation

(Content policies)?

Question?

Page 61: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

61

Answers

Content Policies

Collect

Manage

Disseminate

Page 62: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

62

• Problem– Format obsolescence– Poor quality– Unreadable– Insufficient metadata• To manage• To preserve

Answers

Content Policies

Page 63: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

63

• It should– Track user activities e.g. submit work– Peer review before deposit in IRs

(To ensure quality)• Journal article• Conference proceeding

Answers

Content Policies

Page 64: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

64

Do IRs currently have the necessary sustainability in terms of funding and staffing to carry out long-term preservation of

their contents?

Question?

Page 65: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

65

Answers

Sustainability

PeriodTime

TechnologyChange

Infinity

Short-term

Medium-term

Long-term

Page 66: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

66

• To realize to implement Digital Archive in Institutional Repository

• To Make Agreements and secure permissions for preserving IR contents

• To have guidance of digital format preservation to content contributors

• To plan for Long-term digital preservation• To solve issue of lack of preservation funding

Summary

Page 67: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

Terminology and Wish List

for a Formal Theory of PreservationGiorgos Flouris

FORTH or ICSCNR of ISTI

[email protected]@isti.cnr.it

Meghan BanachCNR of ISTI

[email protected]

Knowledge Preservation

Page 68: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

68

Introduction

BarackObama

44th presidentof USA

Born 04/08 /1961

Bit Preservation

Currently, the system can do

Object Preservation

Bit stream is preserved for long-term by modern media

Bit stream are able to be rendered and display to user originally.

Page 69: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

69

Introduction

BarackObama

44th presidentof USA

Born 08/04 /1961

Information Preservation

Currently, the system may not focus

It becomes a new challenge that the system can preserve ability of understanding the rendered object over the time.

To achieve this challenge, the reader is able to understand rendered object’s content by understanding the terms, concepts, or other information that appears in it, by placing it in its correct context.Currently, this feature is not exist in existing preservation approaches.

Page 70: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

70

Objective

BarackObama

44th presidentof USA

Born 04-Aug-1961

Producer

Consumer

Archive SystemIngest

Render

The objective is that a reader (consumer) is able to perceive information context following his/her background knowledge and understand it originally.

Page 71: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

71

Discussion

Terms

Producer

The creator of the digital object

P D Digital Object

An object that present knowledge in understood-language

C DCConsumer Designated Community

A reader who read digital object

A group of readers who have shared common characteristics and knowledge

Page 72: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

72

1. Producer produced Digital Object and stored in storage media.2. Consumer opens Digital Object from storage media by rendering

sequence of bit values represent the document.3. Consumer obtains Digital Object by light from output device taking

to his eyes.4. Consumer understands meaning of Digital Object by D itself and

contextual knowledge from his/her Designated Community

Discussion

Understanding Process

Goal

Consumer is able to understand Digital Object originally over the time

Page 73: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

73

• The key is “meaning” of digital knowledge.– The meaning of a digital object can be

viewed as a special kind of mapping that associates a symbol with a particular real-world concept.

– This association is not always clear by looking at the digital object alone.

• A date format is a good example that make people confuse.– If European notation, he was born on 8th

of April.– If American notation, he was born on 4th

of August.

Underlying Community Knowledge

BarackObama

44th presidentof USA

Born 08/04 /1961

Flouris & Meghan

Page 74: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

74

• In order to capture the “meaning” of a Digital Object, the Digital Object needs to be described in Language .

Underlying Community Knowledge

L Language

An arrangement symbols that associate with real-world concept

• Language should be a formal language that can be interpreted by both Producer and Consumer.

• Purposes of Language are– Providing formulation rules that

encode real-world concept to be symbols.

– Providing logic’s semantic that use contextual, background, or commonsense information in order to decode symbol to be real-world concept

Page 75: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

75

Underlying Community Knowledge

08/04August

4P DL

• The producer need to represent “4th of August” in a common language. Thus, she need to use contextual, knowledge, or commonsense information that she agree with her community in order to write a symbol representing “4th of August”.

• She decides to use “08/04” because everyone in the same community understand this and can interpret to “4th of August”.

• It means that she, and readers in the same community at that period understand the same meaning.

Page 76: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

76

From simple Math function f(x) = y

Underlying Community Knowledge

Every people use Interpret function to understand meaning of language

producer.interpret( “08/04” ) = “4th of August”

reader01.interpret( “08/04” ) = “4th of August”reader02.interpret( “08/04” ) = “4th of August”

In this case, everyone interprets language “08/04” to be “4th of August” because inside the interpret process has formula.

Formula comes from knowledge. If knowledge is agreed in community, formula is produced from

community knowledge.It means that Producer and all reader have the same formula, so they understand the same thing together.

Page 77: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

77

Underlying Community Knowledge

Underlying Community Knowledge

Knowledge from designated community (DC) that help members to similarly understand association between language and real-world concept.Therefore, key feature of UCK is to produce formulas that are able to - Encode real-world concept to be language- Decode language to be real-world concept

UCK

Page 78: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

78

Evaluation of DC

08/04April

C DL

8producer.interpret( “08/04” ) = “4th of August”

consumer.interpret( “08/04” ) = “8th of April”

Why consumer understand incorrectly?

Page 79: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

79

• When the time change, designated community may be changed, and knowledge may be changed.

• Thus, “understanding” may be changed, too.• The critical cause is a change of UCK.– Because difference UCK makes difference formula

that makes difference understanding. • Next challenge is “How to capture change of UCK”

Evaluation of DC

Page 80: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

80

Evaluation of DC

UCK Evolution Structure

A structure that represent difference (delta) of UCKs. UCKES captures change of UCK’s language from change of UCK’s theory such as ontology evolution.

UCKES

UCKES represent a gap of each UCK

CP

Page 81: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

81

Evaluation of DC

CP

UCK Mapping Structure

A complex mechanism that use UCKES to produce relationship between Consumer’s formula and Producer’s formula. The main function is to change language in order to make the same understanding of real-world concept

UCKMS

Page 82: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

82

Is it possible?

Evaluation of DC

producer.interpret( “08/04” ) = “4th of August”

consumer.interpret( “04/08” ) = “4th of Auguse”

Page 83: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

83

Evaluation of DC

ConsumerProducer

Right now, Consumer get incorrect understanding from language that Producer need to present.

UCKFormula

Formula

UCK

08/04Read Read

Digital Object D

Page 84: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

84

Evaluation of DC

ConsumerProducer

08/04

The system should understand knowledge from Consumer’s side and generate mapping between Producer’s formula and Consumer’s formula using UCKES and UCKMS mechanism

UCKFormula

Formula

UCKUCKES

UCKMS

Digital Object D

Page 85: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

85

Evaluation of DC

ConsumerProducer

08/04

Then, the system transform the digital object D to be D’. D’ contains language that make Consumer understand same thing as Producer

UCKFormula

Formula

UCKUCKES

UCKMS

04/08

Digital Object D Digital Object D’

Read Read

Page 86: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

86

Summary

BarackObama

44th presidentof USA

Born 08/04 /1961

BarackObama

44th presidentof USA

Born 04/08 /1961

Consumer understand D’ as same thing as Producer understand D.

It means that D’ has preservability relation with D.

D D’

D’ D

Page 87: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

87

Summary

Next step

How to preserve underlying community knowledge as well as digital object?

• It needs to think of “Reader” when do preservation by providing information to ensure that the reader can understand digital object originally from their knowledge.

Page 88: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

88

Technology Review

Page 89: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

89

• The PREMIS Data Dictionary defines preservation metadata as "the information a repository uses to support the digital preservation process”

• The metadata including– Intellectual information

• Intellectual unit such as book, map, movie, song, …

– Digital object information• A digital object that actualize from intellectual information. • E.g. pdf, image, video, audio, …

– Agent information• Person or system involving with digital object

– Event information• Record of activities of an digital object

– Right information• Agreement of the digital object

PREMIS

wikipedia.org, LOC.gov

Page 90: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

90

• An Open Archival Information System (or OAIS) is a reference model of archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.

• Features– Ingest, Archive, Preservation Plan, Administration,

Dissemination, and Access• End users– Provider, Consumer, and Management

OAIS

wikipedia.org, OLCL.org

Page 91: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

91

?

Page 92: Semantic Digital Preservation Rathachai Chawuthai rathachai.chawuthai@live.com Information Management CSIM / AIT Introduction Issued document 1.0.

92

References

• http://www.dlib.org/dlib/may11/yuanli/05yuanli.html• http

://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.9681&rep=rep1&type=pdf

• http://www.loc.gov/standards/premis/• http://en.wikipedia.org/wiki/Preservation_Metadata:_Implementation_Strategies

_(PREMIS)• http://www.oclc.org• http://public.ccsds.org/publications/archive/650x0b1.pdf• http://en.wikipedia.org/wiki/Open_Archival_Information_System

1