Farl web archiving

22
A survey of web-based art resources with findings applicable to FARL electronic records collection development Alison Rhonemus, LIS 698, Seminar and Practicum, Dr. Tula Giannini Frick Art Reference Library Deborah Kempe, Chief, Collections Management & Access Web Survey and Collection Development Coffee on the terrace

Transcript of Farl web archiving

Page 1: Farl web archiving

A survey of web-based art resources with findings applicable to FARL electronic records collection development

Alison Rhonemus, LIS 698, Seminar and Practicum, Dr. Tula Giannini

Frick Art Reference LibraryDeborah Kempe, Chief, Collections Management & Access

Web Survey and Collection Development

Coffee on the terrace

Page 2: Farl web archiving

M-LEAD-TWO

Intern enterprises -"collection assessments, digital resource surveys, web archiving, provide support for important consortial programs such as shared resources"● Brooklyn Museum: Mark Daly, Ronnette Hope,

Project Manager: Emily Atwater● NYARC Latin American Resources (MOMA):

Ralph Baylor● FARL: Gretchen Nadasky, Alison Rhonemus

Page 3: Farl web archiving

Frick Art Reference Library

In early 2011, the Frick Art Reference Library and the Thomas J. Watson Library at The Metropolitan Museum of Art completed a pilot project to address coordinated collecting of born-digital auction catalogs using ContentDM and Archive-It.

Page 4: Farl web archiving

FARL web archiving program is situated in Collection Development.Current plans for website capture include online auction catalogs and art web resources

cataloged by NYARC.Fellow MLEAD-TWO intern Gretchen Nadasky has just described online auction

catalogs.My project focused on NYARC cataloged websites.

Page 5: Farl web archiving

Web Archiving

"The Internet Archive is already doing it.”

Actually, the IA is providing the tools for other institutions to use in archiving.

Page 6: Farl web archiving

ARCHIVE - ITuses open source tools developed by the

Internet Archive● Heritrix Web Crawler ● Wayback Interface● WARC format, an ISO standard

Page 7: Farl web archiving
Page 9: Farl web archiving

• Password protected sites – can not be archived

• Javascript – more complicated implementation can be difficult to capture and display. Ongoing area of development.

• Videos -- difficulty with some proprietary formats

• Form and Database driven content --‐ may be archived using a sitemap or other direct links to the content.

Evaluating seeds

Page 10: Farl web archiving

Robots.txt Blocks

The crawler by default respects all robots.txt files. Check post--‐crawl reports for blocked seeds or documents

If your site is blocked:

a) Contact the site owner and ask if they will un--‐block

b) Ask your Partner Specialist to turn on “ignore robots” feature in your account

Notes:

/ denotes single directory seed

subdomains.archive.org (add individually or expand seed)

Page 11: Farl web archiving

Site Survey Criteria● html/flash/pdf

● images

● embedded material ● links ● directories and subdomains ● terms, rights statements and permissions

Page 12: Farl web archiving

Obvious ruse

Page 13: Farl web archiving

More of the obvious

Sites created without the intention of being archived are the sites in need of

archiving.

Page 14: Farl web archiving

Survey Says

● 257 cataloged entries● 168 resources are possible to capture ● 82 resources would require more research or

display definite red flags for web archiving. ● PDFs are available for at least some of the

content in 75 resources. ● Flash was an element in 23 resources ● 16 sites used HTML5 ● 54 used a CMS like Drupal or WordPress

Page 15: Farl web archiving

There were 3 cataloged resources no longer available on the live web but viewable through Internet Archive. Another 2 defunct resources were not available through Internet Archive. The main page for one of these lost resources was available as a snapshot in WAYBACK but the actual cataloged resource was not available.

Page 16: Farl web archiving
Page 17: Farl web archiving
Page 18: Farl web archiving
Page 19: Farl web archiving

Change is Constant

Archive-It Updates:● Heritrix 1 series to Heritrix 3 series (February)● Archive-It 4.8

(May)

Page 20: Farl web archiving

Archive-It 4.8

Page 21: Farl web archiving

Plans

● Upcoming grants

● Capture of NYARC institution websites

● Include Wayback interface links in Arcade catalog records

● Continue to identify websites for capture and implement capture

Page 22: Farl web archiving

Conclusions

○ Digital resources not prevalent enough to reassign current staff

○ Website capture most costly in terms of staff time

○ Copyright continues to be an issue

○ Long term digital preservation needs yet to be assessed

○ Capture of Frick Collection sites and NYARC will pose as a challenging test case