Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based...

32
Preserving the Web Preserving the Web Similarities and Similarities and Dissimilarities Between the Dissimilarities Between the Conventional and the Web- Conventional and the Web- based Publishing Environment based Publishing Environment G. Bokos – V. Chrissikopoulos

Transcript of Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based...

Page 1: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Preserving the WebPreserving the WebSimilarities and Dissimilarities Similarities and Dissimilarities

Between the Conventional and the Between the Conventional and the Web-based Publishing Web-based Publishing

EnvironmentEnvironment

G. Bokos – V. Chrissikopoulos

Page 2: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

22

DefinitionsDefinitions

Conventional Publishing EnvironmentConventional Publishing Environment• Based on the conventional media for the Based on the conventional media for the

production of, mostly, printed production of, mostly, printed documentsdocuments

Web-based Publishing EnvironmentWeb-based Publishing Environment• The publishing space provided by the The publishing space provided by the

Internet and the web infrastructureInternet and the web infrastructure PreservationPreservation

Page 3: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

33

PreservationPreservation Objectives.Objectives.

• To preserve library and archive materials (the cultural To preserve library and archive materials (the cultural and intellectual works of today) and the and intellectual works of today) and the informationinformation contained in them for the benefit of future generations.contained in them for the benefit of future generations.

The mechanism.The mechanism.• A complicated set of interrelated financial, managerial A complicated set of interrelated financial, managerial

and technical arrangements, measures, rules, and technical arrangements, measures, rules, standards, etc.standards, etc.

Problems and difficulties.Problems and difficulties.• Who? When? Where? What? Why? How?Who? When? Where? What? Why? How?

Preservation in the digital environment.Preservation in the digital environment.• The old questions with a new meaning (or, new answers The old questions with a new meaning (or, new answers

to old questions).to old questions).• Some new questions.Some new questions.

Page 4: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

44

Conventional and Web-based Conventional and Web-based Documents: A ComparisonDocuments: A Comparison

SimilaritiesSimilarities• Locating the documentsLocating the documents

A problem in both the conventional and the web-A problem in both the conventional and the web-based environmentbased environment

• Collecting the documentsCollecting the documents• Deciding on what, why, etc. to preserveDeciding on what, why, etc. to preserve

Impossible to preserve everythingImpossible to preserve everything Defining criteria for choiceDefining criteria for choice Deciding who is to preserve whatDeciding who is to preserve what Developing preserving policy & mechanismsDeveloping preserving policy & mechanisms Developing preserving tools & techniques Developing preserving tools & techniques

Page 5: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

55

Conventional and Web-based Conventional and Web-based Documents: A Comparison (2)Documents: A Comparison (2)

DissimilaritiesDissimilarities• The The locationlocation of the documents of the documents• The The collectioncollection of the documents of the documents• The The naturenature of the of the documentsdocuments and of the and of the

publishing spacepublishing space• The The formform of the documents of the documents

What could be considered as a document in What could be considered as a document in the web?the web?

Page 6: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

66

Conventional and Web-based Conventional and Web-based Documents: A Comparison (3)Documents: A Comparison (3)

• The The preservation preservation methodsmethods, , tools & tools & techniquestechniques

• The continuously increasing The continuously increasing growthgrowth rate of rate of publishingpublishing and of and of records productionrecords production in the in the webweb

• The The lack of mature mechanismslack of mature mechanisms for for locating, collecting, storing and preserving locating, collecting, storing and preserving web documentsweb documents

Page 7: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

77

Conventional and Web-based Conventional and Web-based Documents: A Comparison (4)Documents: A Comparison (4)

• The documents’ preservation and access The documents’ preservation and access complete complete dependency on technologydependency on technology, , combined with the rapid and continuous combined with the rapid and continuous advancements of the relevant technologies advancements of the relevant technologies (the threat of (the threat of technological obsolescencetechnological obsolescence) )

• The unresolved The unresolved legal and organizational legal and organizational issuesissues concerning the management of concerning the management of intellectual property rights of digital intellectual property rights of digital informationinformation

Page 8: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

88

Location of DocumentsLocation of Documents

In general:In general:• Web documents are Web documents are easiereasier, as compared with , as compared with

the conventional ones, the conventional ones, to locateto locate, but , but harder to harder to preservepreserve. Specifically:. Specifically:

Conventional environmentConventional environment• Scarcity and fixed number of available copies Scarcity and fixed number of available copies

(but geographically dispersed)(but geographically dispersed)• Ineffective searching toolsIneffective searching tools

Page 9: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

99

Location of Documents (2)Location of Documents (2)

Web environmentWeb environment• Vast amount of available documents (although Vast amount of available documents (although

in many cases in only one copy)in many cases in only one copy)• Growth rate of web publications (more than a Growth rate of web publications (more than a

billion of accessible web sites, millions of billion of accessible web sites, millions of documents and records of several types documents and records of several types published and exchanged each day, many of published and exchanged each day, many of which without any human intervention)which without any human intervention)

• Volatile nature of documentsVolatile nature of documents• Lack of standardization Lack of standardization

Types of documentsTypes of documents Mechanisms and channels of production & Mechanisms and channels of production &

distributiondistribution

Page 10: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

1010

Collection of DocumentsCollection of Documents

Methods of collectionMethods of collection• Bulk collecting (an automated process)Bulk collecting (an automated process)

Method used in the web environment onlyMethod used in the web environment only

• Selective collecting (A process managed by Selective collecting (A process managed by specialized staff)specialized staff)

Method used in both the conventional and the Method used in both the conventional and the web environmentweb environment

Page 11: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

1111

Collection of Documents (2)Collection of Documents (2) The meaning of “collection” of web The meaning of “collection” of web

documentsdocuments• What to collect from the continuously What to collect from the continuously

increasing pool of web documents or from increasing pool of web documents or from complex digital documents? (e.g. Web sites)complex digital documents? (e.g. Web sites)

• How & what to collect from the specific digital How & what to collect from the specific digital documents (some of the relevant questions):documents (some of the relevant questions):

All the files concerned? Part of the files? The All the files concerned? Part of the files? The files in their context?files in their context?

Provisions for accessing and using the digital Provisions for accessing and using the digital documents by the interested usersdocuments by the interested users

Page 12: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

1212

Nature of Documents (and of the Nature of Documents (and of the Publishing Space)Publishing Space)

Conventional vs Digital documents, Conventional vs Digital documents, or or AtomsAtoms vs vs BitsBits

•Differences regarding:Differences regarding: Storage conditions, needs and Storage conditions, needs and

techniquestechniques Handling requirementsHandling requirements Modes and tools of accessing and usingModes and tools of accessing and using AvailabilityAvailability

Page 13: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

1313

Storage Conditions, Needs and Storage Conditions, Needs and TechniquesTechniques

•Storage of media does not secure Storage of media does not secure long term preservationlong term preservation

•Storage of digital information for Storage of digital information for long term preservation requires the long term preservation requires the use of a complex (although not use of a complex (although not complete or mature yet) set of complete or mature yet) set of methods and techniques methods and techniques

•Storage of both physical media and Storage of both physical media and digital informationdigital information

Page 14: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

1414

Handling RequirementsHandling Requirements

•Special equipmentSpecial equipment in specific in specific configurations of soft & hardware configurations of soft & hardware is is always necessaryalways necessary for accessing and for accessing and using the documentsusing the documents

•Specific guidelinesSpecific guidelines, , user guides user guides andand manuals are necessarymanuals are necessary for handling for handling and using the several components and using the several components of such a configuration of such a configuration

Page 15: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

1515

Modes and Tools of Accessing Modes and Tools of Accessing and Usingand Using

•From “PreservationFrom “Preservation or or Access” to Access” to “Preservation“Preservation and and Access” Access”

•Means of accessing may be Means of accessing may be controlled or defined by the usercontrolled or defined by the user

•Access and use are Access and use are softsoft and and hardware dependenthardware dependent

Page 16: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

1616

AvailabilityAvailability

•In many cases In many cases only one copyonly one copy availableavailable

•EasyEasy and and hard to control change of hard to control change of availability statusavailability status (documents lost, (documents lost, moved to other locations, moved to moved to other locations, moved to areas with restricted access or to areas with restricted access or to chargeable sites, etc.)chargeable sites, etc.)

Page 17: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

1717

Form & type of DocumentsForm & type of Documents Digital documentsDigital documents

• Unstable & volatileUnstable & volatile (content, version & (content, version & location control)location control)

• Dynamically changedDynamically changed, reformatted, or , reformatted, or combined with other similar or related combined with other similar or related documentsdocuments

• Lack of standard typesLack of standard types of documents (as of documents (as concerns mainly content structure and concerns mainly content structure and presentation)presentation)

• Many Many new new types, formats and structurestypes, formats and structures of of “publications”“publications” (e.g. Newsgroups, chat- (e.g. Newsgroups, chat-rooms, etc.) rooms, etc.)

Page 18: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

1818

Form of Documents (2)Form of Documents (2)

• Multiform and multimedia contentMultiform and multimedia content combined in many, not fixed or prescribed, combined in many, not fixed or prescribed, but dynamically selected and changed but dynamically selected and changed ways, even, by using the suitable software, ways, even, by using the suitable software, in ways and mode selected by the userin ways and mode selected by the user

• Several formsSeveral forms for recording and presenting for recording and presenting contentcontent

• Hardware and, mainly, Hardware and, mainly, software dependentsoftware dependent • Digital Digital documentdocument or digital or digital objectobject??

Page 19: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

1919

Simplified Schematic Representation of a Web DocumentSimplified Schematic Representation of a Web Document

Page 20: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

2020

Simplified Schematic Representation of a Digital ObjectSimplified Schematic Representation of a Digital Object

Page 21: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

2121

Preservation Methods, Tools & Preservation Methods, Tools & TechniquesTechniques

Digital objects:Digital objects:• Are, Are, likelike and, in general, and, in general, more thanmore than

conventional documents, susceptible to conventional documents, susceptible to lossloss (due to physical breakdown of the (due to physical breakdown of the media)media)

• May be rendered May be rendered inaccessible inaccessible (due to (due to advances in technology)advances in technology)

• May be leftMay be left meaningless meaningless (due to the lack of (due to the lack of contextual evidence)contextual evidence)

Page 22: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

2222

Preservation Methods, Tools & Preservation Methods, Tools & Techniques (2)Techniques (2)

Preservation, thus, in the digital Preservation, thus, in the digital environment consists of:environment consists of:• Intellectual preservationIntellectual preservation

Securing the Securing the the integrity and authenticity the integrity and authenticity of the information as originally recordedof the information as originally recorded

• Media preservationMedia preservation Preservation of the physical media on which Preservation of the physical media on which

information has been storedinformation has been stored

• Technology preservationTechnology preservation Coping with technology obsolescence Coping with technology obsolescence

Page 23: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

2323

Preservation Methods, Tools & Preservation Methods, Tools & Techniques (3)Techniques (3)

Preservation in the digital Preservation in the digital environment concerns both (but to a environment concerns both (but to a certain extend) certain extend) physical mediaphysical media and and digital informationdigital information

In both cases preservation activities In both cases preservation activities should copy with the threat of should copy with the threat of technological obsolescencetechnological obsolescence

Page 24: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

2424

Preservation Methods, Tools & Preservation Methods, Tools & Techniques (4)Techniques (4)

Technological obsolescence:Technological obsolescence:• Some of the methods proposed to cope Some of the methods proposed to cope

with in the digital environment:with in the digital environment: Refreshing digital information by copying it Refreshing digital information by copying it

from medium to mediumfrom medium to medium Migrating digital information from platform Migrating digital information from platform

to platformto platform Developing and maintaining a complex set Developing and maintaining a complex set

of suitable emulatorsof suitable emulators

Page 25: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

2525

Preservation Methods, Tools & Preservation Methods, Tools & Techniques (5)Techniques (5)

Preservation of digital objects:Preservation of digital objects:• Unlike the conventional publishing Unlike the conventional publishing

environment, preservation in the digital environment, preservation in the digital world world cannot happen as an after thought cannot happen as an after thought (in such a case it will be rather too late).(in such a case it will be rather too late).

• It must be planned It must be planned in advance and in advance and preferably during the preferably during the design stagedesign stage of of information systems or during the information systems or during the creationcreation of documents or information.of documents or information.

• How?How?

Page 26: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

2626

Preservation Methods, Tools & Preservation Methods, Tools & Techniques (6)Techniques (6)

Planning for preservation in the digital Planning for preservation in the digital environment:environment:• Unlike the conventional publishing space, Unlike the conventional publishing space,

preservation datapreservation data may and should be may and should be incorporatedincorporated in the relevant digital objects in the relevant digital objects

• Preservation in the digital environment is, thus, Preservation in the digital environment is, thus, the work and the responsibility of all people, the work and the responsibility of all people, organizations and institutions involved in the organizations and institutions involved in the creation and management of digital objects creation and management of digital objects

Page 27: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

2727

Preservation MetadataPreservation Metadata

• MetadataMetadata for preservationfor preservation is the kind of is the kind of preservation data that may incorporated preservation data that may incorporated in digital objects during their whole life-in digital objects during their whole life-cycle and by all people or organizations cycle and by all people or organizations involved in their creation and involved in their creation and subsequent management subsequent management

• MetadataMetadata for preservationfor preservation is, thus, a is, thus, a key elementkey element for the design of an for the design of an effective preservation strategy for the effective preservation strategy for the web contentweb content

Page 28: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

2828

Preservation Metadata (2)Preservation Metadata (2)

Preservation MetadataPreservation Metadata may and should may and should include:include:• Information about the Information about the sourcesource of the data of the data• Details of Details of creationcreation (how, why, when) (how, why, when)• Details on its Details on its intended functionintended function, , purposepurpose and and

target grouptarget group• Guidelines and terms for Guidelines and terms for accessaccess and and useuse• MigrationMigration history history• Details on the necessary for access and use Details on the necessary for access and use

softsoft and and hardwarehardware• Details on Details on relationsrelations to other to other materialmaterial,, records records

and and softwaresoftware

Page 29: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

2929

ConclusionsConclusions

Preservation activities in both the Preservation activities in both the conventional and the web-based conventional and the web-based publishing space have publishing space have similarsimilar objectives, objectives, procedures, problems and difficultiesprocedures, problems and difficulties

However, there are However, there are considerable considerable dissimilaritiesdissimilarities between the two between the two environmentsenvironments

These dissimilarities are due to the radical These dissimilarities are due to the radical differences in the nature, the form and the differences in the nature, the form and the typology of both the web document and typology of both the web document and the web publishing infrastructure the web publishing infrastructure

Page 30: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

3030

Conclusions (2)Conclusions (2) They are also due to:They are also due to:

• The The great complexitygreat complexity (in terms of organization (in terms of organization (or, (or, lack of organizationlack of organization), standards, mode of ), standards, mode of evolution etc.) of the structure and the evolution etc.) of the structure and the functions of the Internet and the web spacefunctions of the Internet and the web space

• The The continuouscontinuous and and rapid changerapid change and and evolutionevolution of the whole web environment and, thus, to the of the whole web environment and, thus, to the immaturityimmaturity of types, forms, tools and of types, forms, tools and procedures of the relevant publishing procedures of the relevant publishing mechanismmechanism

• The The continuouscontinuous and and rapid changerapid change and and evolution evolution of the supporting technology in terms of of the supporting technology in terms of standards, soft and hardwarestandards, soft and hardware

Page 31: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

3131

Conclusions (3)Conclusions (3)

These dissimilarities, thus, require:These dissimilarities, thus, require:• A A completely differentcompletely different, as compared to , as compared to

the conventional environment, the conventional environment, approachapproach to the planning and application of to the planning and application of preservation activities and measurespreservation activities and measures

• This approach should be based on a This approach should be based on a detailed detailed planning in advance principleplanning in advance principle

• Different methods, tools and techniquesDifferent methods, tools and techniques for preserving the digital informationfor preserving the digital information

Page 32: Preserving the Web Similarities and Dissimilarities Between the Conventional and the Web-based Publishing Environment G. Bokos – V. Chrissikopoulos.

Ionian University - Archives and LibIonian University - Archives and Library Science Dept.rary Science Dept.

3232

Conclusions (4)Conclusions (4)

In general:In general:• The web infrastructure and the respective The web infrastructure and the respective

publishing environment is publishing environment is still evolving and still evolving and changingchanging

• The publishing mechanism and products are The publishing mechanism and products are notnot, as yet, , as yet, standardized, stable and maturestandardized, stable and mature

• We know that we need a We know that we need a different approachdifferent approach, , methods and tools for web preservation, but methods and tools for web preservation, but we are still in the stage of studying the we are still in the stage of studying the situation and experimenting with possible situation and experimenting with possible solutions and tools solutions and tools