Web Archiving: An Overview
-
Upload
karl-rainer-blumenthal -
Category
Technology
-
view
736 -
download
0
Transcript of Web Archiving: An Overview
![Page 1: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/1.jpg)
Web Archiving: An Overview
Karl-Rainer Blumenthal, Internet ArchiveSumitra Duncan, Frick Art Reference Library
Metropolitan New York Library CouncilJanuary 7, 2015
![Page 2: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/2.jpg)
What is web archiving?
Web archiving is the process of collecting, preserving, and enabling access to web-native materials.
![Page 3: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/3.jpg)
Why archive the web?
> Collect web-native resources in your traditional collecting scope.
> Fulfill a records retention requirement.
> Document spontaneous/online events.
> Combat link rot and content drift (no more 404s!).
![Page 4: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/4.jpg)
How does it work?
> Web crawlers navigate live websites and download their source code to Web ARChive (WARC) files.
![Page 5: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/5.jpg)
How does it work?
> Replay technologies render the archived websites as they appeared at the time they were crawled.
![Page 6: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/6.jpg)
Web archiving tools and services
The Wayback Machinehttps://archive.org/web/
The largest publicly available web archive in existence.
> 450+ Billion URLs > 100+ million websites> 40+ languages > ~ 1 billion URLs added per week
![Page 7: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/7.jpg)
Web archiving tools and services
The Wayback Machinehttps://archive.org/web/
The largest publicly available web archive in existence.
> 450+ Billion URLs > 100+ million websites> 40+ languages > ~ 1 billion URLs added per week
![Page 8: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/8.jpg)
Web archiving tools and services
HeritrixHTTrackUmbrawarcproxWget
ARCWARC
Wayback MachineOpenWaybackpywb (Python Wayback)Webenactoldweb.today
![Page 9: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/9.jpg)
Web archiving tools and services
HeritrixHTTrackUmbrawarcproxWget
ARCWARC
Wayback MachineOpenWaybackpywb (Python Wayback)Webenactoldweb.today
Archive-ItNetarchiveSuite (DK/FR)PANDAS (AUS)Web Curator (UK/NZ)Webrecorder
![Page 10: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/10.jpg)
Who archives the web?
Society of American Archivist Web Archiving Roundtable> 900+ member participants
Archive-It> 400+ partner organizations (software service subscribers)
National Digital Stewardship Alliance (NDSA)> Surveyed web archivists in in 2011, 2013, 2015...
![Page 11: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/11.jpg)
Who archives the web?
Organizations with web archiving programs by typeNDSA, Web Archiving in the United States: A 2013 Survey
52%
15%13%
8%
5%
4%
1%
2%
![Page 12: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/12.jpg)
Who archives the web?
Use of external service vs. in-house archivingNDSA, Web Archiving in the United States: A 2013 Survey
63%
16%
20%
![Page 13: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/13.jpg)
Who archives the web?
Staff dedicated to web archiving programNDSA, Web Archiving in the United States: A 2013 Survey
36%
19%
25%
6%
7%7%
![Page 14: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/14.jpg)
Participation in a collaborative web archiveNDSA, Web Archiving in the United States: A 2013 Survey
Who archives the web?
48%
33%
17%2%
![Page 15: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/15.jpg)
Web archiving issues and trends
> Access and discovery
> Big data analysis
> Appraisal, provenance, and metadata
> Spontaneous events and social media
> Permissions and privacy policies
![Page 16: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/16.jpg)
Web archiving issues and trends
> Access and discovery
> Big data analysis
> Appraisal, provenance, and metadata
> Spontaneous events and social media
> Permissions and privacy policies
![Page 17: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/17.jpg)
Web archiving issues and trends
> Access and discovery
> Big data analysis
> Appraisal, provenance, and metadata
> Spontaneous events and social media
> Permissions and privacy policies
![Page 18: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/18.jpg)
Web archiving issues and trends
> Access and discovery
> Big data analysis
> Appraisal, provenance, and metadata
> Spontaneous events and social media
> Permissions and privacy policies
![Page 19: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/19.jpg)
Web archiving issues and trends
> Access and discovery
> Big data analysis
> Appraisal, provenance, and metadata
> Spontaneous events and social media
> Permissions and privacy policies
![Page 20: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/20.jpg)
NYARC
![Page 21: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/21.jpg)
Why web archiving at NYARC?
> Drift from print to born-digital
> Alignment with traditional collecting strengths & unique holdings
> Ephemeral nature of websites & risk of impermanence
> Not addressed elsewhere = risk of gap in art historical record
> Leverage consortial collaboration = better able to be nimble
Willem de Ridder. European Mail-Order Warehouse/Fluxshop inventory with Dorothea Meijer, seated, in the home of the artist, Amsterdam. 1964-65. Gelatin silver print. The Museum of Modern Art, New York.
![Page 22: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/22.jpg)
How NYARC got started
> 2010 Auction House Pilot Study with Archive-It
> 2012 Planning Study
> 2013-2015 Mellon Grant for Web Archive Implementation
![Page 23: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/23.jpg)
Web archiving life cycle at NYARC
![Page 24: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/24.jpg)
Collection development / curation
![Page 25: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/25.jpg)
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
![Page 26: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/26.jpg)
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
![Page 27: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/27.jpg)
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
![Page 28: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/28.jpg)
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
![Page 29: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/29.jpg)
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
![Page 30: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/30.jpg)
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
![Page 31: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/31.jpg)
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
![Page 32: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/32.jpg)
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
![Page 33: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/33.jpg)
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
![Page 34: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/34.jpg)
Curation & Quality assurance
![Page 35: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/35.jpg)
Challenges & Lessons learned
> Scale
> Rapidly evolving and new technologies
> Cost
> Infrastructure/tools
> Permissions/intellectual property considerations
![Page 36: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/36.jpg)
Goals & Lessons learned
> Rich and substantial collections
> Permanence and long-term preservation
> Scalability and sustainability
> Networked collections
> Greater collaboration = crucial to work together
![Page 37: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/37.jpg)
Where can/should I get started?
NDSA Web Archiving in the United States Surveyshttp://1.usa.gov/1z1H3jo
SAA Web Archiving Roundtablewww2.archivists.org/groups/web-archiving-roundtable
METRO Web Archiving Special Interest Grouplibguides.metro.org/webarchiving
International Internet Preservation Consortiumnetpreserve.org
![Page 38: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/38.jpg)
Where can/should I get started?
NDSA Web Archiving in the United States Surveyshttp://1.usa.gov/1z1H3jo
SAA Web Archiving Roundtablewww2.archivists.org/groups/web-archiving-roundtable
METRO Web Archiving Special Interest Grouplibguides.metro.org/webarchiving
International Internet Preservation Consortiumnetpreserve.org
Jill Lepore, “The Cobweb: Can the Internet be Archived?” The New Yorker, 1/26/2015http://www.newyorker.com/magazine/2015/01/26/cobweb
![Page 39: Web Archiving: An Overview](https://reader031.fdocuments.in/reader031/viewer/2022022412/58f14f101a28abe41a8b465b/html5/thumbnails/39.jpg)
Thanks!
...and keep in touch!
Karl-Rainer BlumenthalWeb Archivist, Internet Archive
[email protected]@LandLibrarian
Sumitra DuncanNYARC Web Archiving Coordinator Frick Art Reference Library
[email protected]@artlibrariannyc
Image credits:
Condé Nast
International Internet Preservation Consortium
Susan Kare, Museum of Modern Art
National Digital Stewardship Alliance
Archive-It
Society of American Archivists
Brian Ejar
Simple Icons
Creative Stall
Iconathon
Museum of Modern Art
The Frick Collection
Brooklyn Museum
New York Art Resources Consortium