A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian...

14
A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email [email protected] URL http://www.ukoln.ac.uk/ UKOLN is supported by:

Transcript of A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian...

Page 1: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

Digital Preservation /UK Web Focus

Brian KellyUKOLNUniversity of BathBath, BA2 7AY

[email protected]://www.ukoln.ac.uk/

UKOLN is supported by:

Page 2: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

About UKOLN / UK Web Focus

UKOLN:• Centre of expertise ion digital information

management• About 30 FTEs• Based at University of Bath• Funded by JISC and Resource, with additional

funding from JISC, EU, etc. project workUK Web Focus:

• Adviser to UK HE and FE and (since 1 Aug 03) cultural heritage sector in England & Wales on Web issues

• Project manager for JISC-funded QA Focus work and of NOF-digi Technical Advisory Service

Page 3: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

Work In Area Of Preservation

UKOLN activities include:• Membership of DPC (Digital Preservation Coalition)

– see <http://www.dpconline.org/>• Production of What's New newsletter -

<http://www.dpconline.org/graphics/whatsnew/>• Study on "Collecting and preserving the World

Wide Web" – see <http://library.wellcome.ac.uk/projects/archiving_reports.shtml>

• Participation on forum on Web archiving in March 2003 – see <http://www.dpconline.org/graphics/events/web-archiving.html>

Michael Day is active in this area – see publications at <http://www.ukoln.ac.uk/preservation/publications/>

Michael Day is active in this area – see publications at <http://www.ukoln.ac.uk/preservation/publications/>

Page 4: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

What's Happening In UK?

In the UK:• JISC very active in this area:

Involvement in DPC Neil Beagrie is JISC's Assistant Director (Preservation

and Resources) Interest in preservation of scholarly reports, scientific

data, …

• Cultural Heritage sector have an interest: British Library NOF-digitise projects, …

• Interest at government level: British Library Preservation of 1997 election Web sites Post Sept 11 awareness of importance DPC launch outside Westminster

Page 5: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

What's Happening Globally?

Much interest:• Preservation workshops e.g. ECDL:

<http://www.cultivate-int.org/issue8/ecdlws2/> <http://www.ariadne.ac.uk/issue37/ecdl-web-

archiving-rpt/>

• National initiatives: Australia: <http://pandora.nla.gov.au/> Finland: <http://nwa.nb.no/> France: See <http://www.dlib.org/dlib/

december02/masanes/12masanes.html>

• Internet Archive: See <http://www.archive.org/>

Page 6: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

Internet Archive

Page 7: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

Who's Problem?

Should preservation of Web sites be carried out:

• By Web site owner's themselves• By owner, funder, …• By a national body (e.g. British Library)• By an international body (e.g. Internet

Archive)• By subject specialists• …

Page 8: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

How To Preserve

Web site preservation can be carried out:• By automated harvesting of Web sites (a

client-side view)• By depositing (a service-side view)• A combined approach

An automated approach can be used to harvest all Web sites within a domain (e.g. all ox.ac.uk Web sites, all .ac.uk, …)Depositing requires Web site owner to upload resource to a repository

An automated approach can be used to harvest all Web sites within a domain (e.g. all ox.ac.uk Web sites, all .ac.uk, …)Depositing requires Web site owner to upload resource to a repository

Page 9: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

Challenges

Harvesting is getting more difficult:• Personalised resources• Dynamic Resources• Client-side features• Backend features 9e.g. search facility• Diversity of resources, formats, etc.

Other issues:• Copyright, IPR, accessibility, defamation, …

What do you want to do?• Provide access to static resources• Preserve look and feel• Provide access to Web site functionality

This is not an area to jump into lightlyThis is not an area to jump into lightly

Page 10: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

But There Is A Need

WebWatching Telematics For Libraries Project Web Sites

• Of 88 EU-funded project Web sites, 23 have disappeared in Jan 2000

• See <http://www.exploit-lib.org/issue7/webwatch/>

WebWatching eLib Project Web Sites• Of 71 eLib project Web sites, 5 had disappeared

in Jan 2001• See

<http://www.ariadne.ac.uk/issue26/web-watch/>

Page 11: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

It Gets Worse!Webtechs.com

• Software company which hosted early HTML validation service

• In 1998/99 confusion over payment of domain name

• March 1999 company receives many messages saying validation service is now a porn site

• Over 30,000 links to Web site!

• Sept 1999 porn company agrees to sell domain name back to Webtech

Page 12: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

The Embarrassment Still ExistsThe hijacked Web site can still be accessed using the Internet Archive's Wayback Machine.

Note that the archived Web site contains JavaScript (and Active X controls?) which could delete data on the viewer's PC

See <http://www.exploit-lib.org/issue1/webtechs/>See <http://www.exploit-lib.org/issue1/webtechs/>

Page 13: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

What I'm Involved In

I am involved in:• Raising awareness of importance of Web site

preservation• Seeking to ensure that project-funded Web sites

are aware of the potential porn site time bomb• Seeking to ensure that projects consider

'mothballing' issues

See:<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-04/><http://www.ukoln.ac.uk/web-focus/events/workshops/

nof-preservation-2003/><http://www.ukoln.ac.uk/web-focus/events/conferences/

online-information-2002/>

See:<http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-04/><http://www.ukoln.ac.uk/web-focus/events/workshops/

nof-preservation-2003/><http://www.ukoln.ac.uk/web-focus/events/conferences/

online-information-2002/>

Page 14: A centre of expertise in digital information management Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.

A centre of expertise in digital information management

www.ukoln.ac.uk

Where To From Here?

Possible options:• Harvest appropriate Web sites. NB httrack is a

highly regarded, easy-to-use open source tool for capturing Web sites – <http://www.httrack.com/>, but remember need to provide capture metadata

• It's too big a problem• It's a big problem, but we can provide advice to

minimise problems• It's a big problem, but someone needs to tackle it,

and we've willing to take a lead from a subject perspective

• We need to read the reports, contribute to the <http://www.jiscmail.ac.uk/lists/DIGITAL-PRESERVATION.html> list, …