From Seed to Harvest: Web Archiving Program Considerations for SUL

59
From Seed to Harvest: Web Archiving Program Considerations for SUL Nicholas Taylor @ nullhandle tanford University Libraries pril 17, 2013 Digital ” by Flickr user clickclaker under CC BY-NC-ND 2.0

description

Presentation given at Stanford University Libraries as part of candidacy for the Web Archiving Service Manager position on web archiving program considerations and elements.

Transcript of From Seed to Harvest: Web Archiving Program Considerations for SUL

Page 1: From Seed to Harvest: Web Archiving Program Considerations for SUL

From Seed to Harvest:Web Archiving

Program Considerations for

SULNicholas

Taylor@nullhandle

Stanford University LibrariesApril 17, 2013

“Digital” by Flickr user clickclaker under CC BY-NC-ND 2.0

Page 2: From Seed to Harvest: Web Archiving Program Considerations for SUL

hello, my name is Nicholas…

Page 3: From Seed to Harvest: Web Archiving Program Considerations for SUL

Library of Congress Web Archiving

Library of Congress: “MINERVA”

Page 4: From Seed to Harvest: Web Archiving Program Considerations for SUL

Web Archiving Life Cycle Model

“Web Archiving Life Cycle Model” by M. Bragg, K. Hanna, et al. (2013). Reproduced with permission.

Page 5: From Seed to Harvest: Web Archiving Program Considerations for SUL

Web Archiving Life Cycle Model

Program Elements• Vision and Objectives• Resources and

Workflow• Access / Use / Reuse• Preservation• Risk Management

Workflow Elements• Appraisal and

Selection• Scoping• Data Capture• Storage and

Organization• Quality Assurance and

Analysis

Page 6: From Seed to Harvest: Web Archiving Program Considerations for SUL

PROGRAM ELEMENTS

Web Archiving

“Element Blocks” by Flickr user Asian Art Museum under CC BY-NC-ND 2.0

Page 7: From Seed to Harvest: Web Archiving Program Considerations for SUL

Vision and Objectives

Page 8: From Seed to Harvest: Web Archiving Program Considerations for SUL

web archiving program vision

ePADD Discovery Module

PASIG

Page 9: From Seed to Harvest: Web Archiving Program Considerations for SUL

SUL mission

“The Stanford University Libraries (SUL) is more than a cluster of libraries; it connects people with information by providing diverse resources and services to the academic community.”

“Stanford University Libraries…develops and implements resources and services…that support research and instruction.”

SUL: “Stanford University Libraries on Vimeo”

SUL: “About The Stanford University Libraries”

SUL: “SULAIR Brief Guide”

Page 10: From Seed to Harvest: Web Archiving Program Considerations for SUL

DLSS mission

“DLSS is the information technology production arm of the Stanford Libraries; it serves as the digitization, digital preservation and access systems provider for SUL; and it is the research and development unit for new technologies, standards and methodologies related to library systems.”

SUL: “New Images of Rare Books and Digitization Devices”

SUL: “SULAIR Digital Library Systems and Services (DLSS)”

Page 11: From Seed to Harvest: Web Archiving Program Considerations for SUL

proposed program mission

“The web archiving program will provide capabilities for the acquisition, preservation, and dissemination of resources that are increasingly and, often, exclusively accessible via the web that are necessary to support University research, instruction, and other purposes.”

Page 12: From Seed to Harvest: Web Archiving Program Considerations for SUL

objectives

• build infrastructure• develop expertise• create research

collections• archive records

and deprecated content

• mirror government documents

“Objective” by Flickr user Pedro J. Ferreira under CC BY-NC-ND 2.0

Page 13: From Seed to Harvest: Web Archiving Program Considerations for SUL

Resources and Workflow

Page 15: From Seed to Harvest: Web Archiving Program Considerations for SUL

staffing

• service manager• crawl engineer• curators• system

administrators• software engineers• technical services• legal counsel

“Digitizing Mark Adams cartoons” by Flickr user suldpg under CC BY-NC-SA 2.0

Page 17: From Seed to Harvest: Web Archiving Program Considerations for SUL

readily workflow-able

• collection management

• site nomination• permissions

tracking• crawl scheduling• data capture• quality assurance “

Web Curator Tool User Manual Version 1.5.2”

Page 18: From Seed to Harvest: Web Archiving Program Considerations for SUL

workflow challenges

• test crawling• automated QA• AIP/DIP generation• SDR ingest• indexing• enabling access• tools testing

“Salmon Ladder at Bonneville Dam” by Flickr user Serolynne under CC BY-NC-ND 2.0

Page 19: From Seed to Harvest: Web Archiving Program Considerations for SUL

Access / Use / Reuse

Page 20: From Seed to Harvest: Web Archiving Program Considerations for SUL

access policy

• dark archive• data redistribution• embargo• onsite/offsite

replay• takedown requests

“DO NOT DUPLICATE” by Flickr user Sam UL under CC BY-NC-SA 2.0

Page 22: From Seed to Harvest: Web Archiving Program Considerations for SUL

many Wayback Machines

Wikipedia: “List of Web archiving initiatives”

Page 23: From Seed to Harvest: Web Archiving Program Considerations for SUL

discovery: Memento

“Memento”

Page 25: From Seed to Harvest: Web Archiving Program Considerations for SUL

full-text search: Solr

Archive-It: “Explore All Archives”

Page 26: From Seed to Harvest: Web Archiving Program Considerations for SUL

Preservation

Page 28: From Seed to Harvest: Web Archiving Program Considerations for SUL

preservation engineering

“Máquina de Rube Goldberg en la base del Alinghi” by Flickr user freshwater2006 under CC BY-NC 2.0

Page 29: From Seed to Harvest: Web Archiving Program Considerations for SUL

Risk Management

Page 30: From Seed to Harvest: Web Archiving Program Considerations for SUL

Risk Management

• “appified” web• copyright• ephemeral web• financial

sustainability• fostering use

“Zombie Awareness - Extinguisher” by Flickr user Spiffy0777 under CC BY-NC-SA 2.0

Page 31: From Seed to Harvest: Web Archiving Program Considerations for SUL

Policy

Page 32: From Seed to Harvest: Web Archiving Program Considerations for SUL

copyright

• § 108 (library exceptions)

• fair use• notification vs.

permission• opt-out / takedown• robots.txt• third-party sites• exceptions?

“Noria con Copyrights” by Flickr user Alex Novoa under CC BY-NC-ND 2.0

Page 33: From Seed to Harvest: Web Archiving Program Considerations for SUL

collection development

“leaf-cutter ants” by Flickr user Vilseskogen under CC BY-NC-SA 2.0

Page 34: From Seed to Harvest: Web Archiving Program Considerations for SUL

WORKFLOW ELEMENTS

Web Archiving

“Workflow” by Flickr user luismi_cavalle under CC BY 2.0

Page 35: From Seed to Harvest: Web Archiving Program Considerations for SUL

Appraisal and Selection

Page 36: From Seed to Harvest: Web Archiving Program Considerations for SUL

informing selection

• value• risk• size• extent to which

archived

“Fruit market-Barcelona” by Flickr user Marcel Theisen under CC BY-NC-SA 2.0

Page 37: From Seed to Harvest: Web Archiving Program Considerations for SUL

TwitterVane

UK Web Archive: “TwitterVane”

Page 38: From Seed to Harvest: Web Archiving Program Considerations for SUL

Wikipedia Live Monitor

Thomas Steiner: “Wikipedia Live Monitor”

Page 39: From Seed to Harvest: Web Archiving Program Considerations for SUL

Wikipedia articles

Wikipedia: “List of think tanks in the United States”

Page 40: From Seed to Harvest: Web Archiving Program Considerations for SUL

UNT Nomination Tool

University of North Texas Libraries: “Nomination Tool”

Page 41: From Seed to Harvest: Web Archiving Program Considerations for SUL

Scoping

Page 42: From Seed to Harvest: Web Archiving Program Considerations for SUL

the purpose of scoping

“More god?” by Flickr user one two one three under CC BY-NC-SA 2.0

Page 43: From Seed to Harvest: Web Archiving Program Considerations for SUL

Data Capture

Page 44: From Seed to Harvest: Web Archiving Program Considerations for SUL

Heritrix

Internet Archive: “A Quick Guide to Running Your First Crawl Job”

Page 46: From Seed to Harvest: Web Archiving Program Considerations for SUL

the elusive web

“Light Writing - Spider Web” by Flickr user forcefeed:swede under CC BY-ND 2.0

Page 48: From Seed to Harvest: Web Archiving Program Considerations for SUL

Storage and Organization

Page 49: From Seed to Harvest: Web Archiving Program Considerations for SUL

packages and their contents

“lots and lots and lots of boxes” by Flickr user Toastwife under CC BY-NC-SA 2.0

Page 50: From Seed to Harvest: Web Archiving Program Considerations for SUL

Quality Assurance and Analysis

Page 52: From Seed to Harvest: Web Archiving Program Considerations for SUL

Metadata / Description

Page 53: From Seed to Harvest: Web Archiving Program Considerations for SUL

Metadata / Description

“Hello! My URL Is...” by Flickr user vasta under CC BY-NC-ND 2.0

Page 54: From Seed to Harvest: Web Archiving Program Considerations for SUL

BEYOND THE MODEL

Considerations

“My donut” by Flickr user Molemaster under CC BY-NC-SA 2.0

Page 55: From Seed to Harvest: Web Archiving Program Considerations for SUL

other program requirements

• marketing/outreach• performance

metrics• service level

definitions• service roadmap• training• user

documentation

“Sticky notes” by Flickr user Kris Krug under CC BY-SA 2.0

Page 56: From Seed to Harvest: Web Archiving Program Considerations for SUL

incorporating existing projects

• plan capacity• normalize data• ingest into SDR• seek permissions• process• catalog• enable access

“Geckos” by Flickr user smashz under CC BY-NC-ND 2.0

Page 57: From Seed to Harvest: Web Archiving Program Considerations for SUL

community engagement