Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Herbert Van de Sompel@hvdsomp
http://public.lanl.gov/herbertv/Los Alamos National Laboratory
Acknowledgements:Michael L. Nelson@phonedude_mln
Old Dominion University
Creating Pockets of Persistence
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Addressing the Link/Reference Rot Challenge
• Pockets of Persistence
• Capture – Archive Pro-Actively, Selectively
• Reference – Annotate Links
• Access – Travel in Time
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Pockets of Persistence
How to achieve the ability to:
• Persistently• Precisely• Seamlessly
revisit the Web of the Past and the Web of the Now at some point in the Future
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Pockets of Persistence
How to achieve the ability to:
• Persistently• Precisely• Seamlessly
revisit the Web of the Past and the Web of the Now at some point in the Future
Two components to the link/reference rot challenge:
• Link rot: Links stop working aka 404 Not Found
• Content drift: Referenced content changes over time
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Illustration
Current version of http://en.wikipedia.org/wiki/Coil_(band) on October 22 2014
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Illustration – Link Rot
Current version of http://en.wikipedia.org/wiki/Coil_(band) on October 22 2014
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Illustration – Link Rot
Current version of http://liarsociety.tripod.com/blog/index.blog?from=20041130 on October 22 2014
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Illustration – Content Drift
Version of http://en.wikipedia.org/wiki/Coil_(band) dated October 2 2014http://en.wikipedia.org/w/index.php?title=Coil_(band)&oldid=388321480
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Illustration – Content Drift
Current version of http://en.wikipedia.org/wiki/Peter_Christopherson on October 22 2014
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Illustration – Content Drift
Version of http://en.wikipedia.org/wiki/Peter_Christopherson that was current on October 2 2010http://en.wikipedia.org/w/index.php?title=Peter_Christopherson&oldid=387987414
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Pockets of Persistence
How to achieve the ability to:
• Persistently• Precisely• Seamlessly
revisit the Web of the Past and the Web of the Now at some point in the Future
This challenge exists for the entire web, but some communities actually care about addressing it:
• scholarly communication,• legal publications,• journalism,• Wikipedia,• …
Mobilize the communities that care about this problem to work towards joint, interoperable solutions, approaches
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Addressing the Link/Reference Rot Challenge
• Pockets of Persistence
• Capture – Archive Pro-Actively, Selectively
• Reference – Annotate Links
• Access – Travel in Time
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Pro-Active Capture for a Seed Collection
• Seed Collection - Starting point for capture is a seed collection of interest to communities that care, e.g.o Scholarly literatureo Legal documentso On-Line journalismo Wikipedia articles
• Lifecycle Events – Intervene at critical moments in the lifecycle of items in these collections to pro-actively capture o Collection items – some solutions in placeo Web resources referenced in collection items
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Pro-Active Capture for Seed Collection
• What those crucial lifecycle events are may depend on the collection type
Wikipedia
• Creation of new article• Creation of new version of
article• Creation of substantially
new version of article• Addition of external
reference to article• References to article
exceed a certain threshold
Scholarly Literature
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Authoring Legal Documents – perma.cc
http://perma.cc
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Authoring Scholarly Literature: Experimental Zotero Extension
Richard Wincewicz (2014) Prototype Hiberlink plugin for Zotero for pro-active archiving and temporal referenceshttps://www.youtube.com/v/ZYmi_Ydr65M%26vq
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Submitting Scholarly Literature: Experimental HiberActive Service
Martin Klein et al. (2014) HiberActive: Pro-Active Archiving of web references from scholarly articlesOpen Repositories 2014 http://www.slideshare.net/martinklein0815/hiberactive
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Pro-Active Capture for Seed Collection
• Interoperability for on-demand capture:o Need basic interoperability for machine-driven on-demand
capture:- Discovery of capture interface- Interface IN - [ Original URI ]- Interface OUT - [ URI of Capture ; Capture Datetime ]
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Addressing the Link/Reference Rot Challenge
• Pockets of Persistence
• Capture – Archive Pro-Actively, Selectively
• Reference – Annotate Links
• Access – Travel in Time
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Reference Captures and Annotate Links
• Existing practice for linking to captures:o Link to URI of Captureo Lose Original URIo Lose Capture Datetime
• Problems with existing practice:o Impossible to visit the original URI, if desiredo Requires the permanent existence/uptime of the archive that
holds the capture- One link rot problem replaced by another
Van de Sompel, H. et al. (2013) Thoughts on referencing, linking, reference rothttp://mementoweb.org/missing-link/
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Permanent Existence/Uptime of Archives?
Capture of http://webcitation.org dated July 17 2013https://archive.today/eAETp
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Permanent Existence/Uptime of Archives?
http://webcitation.org/ on August 6 2014
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Permanent Existence/Uptime of Archives?
Remnant of discontinued web archive http://mummify.it captured on February 14 2014https://web.archive.org/web/20140214233752/https://www.mummify.it/
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Permanent Existence/Uptime of Archives?
http://www.themoscowtimes.com/news/article/russia-bans-wayback-machine-internet-archive-over-islamic-state-video/510074.html
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Hacking Original URI, Capture Datetime from Capture URI?
URI of Capture Original URI Datetime T
https://web.archive.org/web/20140214233752/https://www.mummify.it
yes yes
https://archive.today/eAETp no no
http://perma.cc/4RH7-999Q?type=source no no
http://en.wikipedia.org/w/index.php?title=Coil_(band)&oldid=388321480
no no
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Using Capture URI to find Captures in Other Web Archives?
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Using Capture URI to find Captures in Other Web Archives?
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Reference Captures and Annotate Links
• Desired practice for linking to captures is to annotate the link so it conveys:
- URI of Capture- Original URI- Capture Datetime
• Link annotation supports fallback to other archives:o Original URI allows finding captures in all web archiveso Capture Datetime allows finding an appropriate capture in all
web archiveso Original URI and Capture Datetime allows automatic access
to an appropriate capture in all web archives (see Access)
Van de Sompel, H. et al. (2013) Thoughts on referencing, linking, reference rothttp://mementoweb.org/missing-link/
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Reference Captures and Annotate Links
• Desired practice for linking to captures is to annotate the link so it conveys:
URI of Capture
Original URI Capture Datetime
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Reference Captures and Annotate Links
• Interoperability for link annotation:o Need an approach to convey, in a uniform, machine-
actionable way:- URI of Capture- Original URI- Capture Datetime
• Ongoing efforts:o Missing Link Proposal
- http://mementoweb.org/missing-link/o W3C Robustness and Archiving Community Group
- http://www.w3.org/community/irobar/
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Missing Link Proposal
URI of Capture
<a href=“http://liarsociety.tripod.com/blog/index.blog?from=20041130” data-versionurl=“https://archive.today/ElCHn” data-versiondate=“2008-02-06T00:00:00Z”>
Capture Datetime
Original URI
Van de Sompel, H. et al. (2013) Thoughts on referencing, linking, reference rothttp://mementoweb.org/missing-link/
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Addressing the Link/Reference Rot Challenge
• Pockets of Persistence
• Capture – Archive Pro-Actively, Selectively
• Reference – Annotate Links
• Access – Travel in Time
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Memento Web Time Travel
Use the Original URI
Current version of http://law.georgetown.edu/library/404/ on October 22 2014
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Memento Web Time Travel
And a Datetime
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Memento Web Time Travel
To automatically retrieve the temporally nearest available capture
Capture of http://law.georgetown.edu/library/404/ dated May 3 2014http://wayback.archive-it.org/all/20140503094327/http://www.law.georgetown.edu/library/404/
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Memento Web Time Travel
http://mementoweb.org
http://bit.ly/memento-for-chrome
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Travel in Time - Persistently, Precisely, Seamlessly
On-Demand Capture URI of Capture Original URI Datetime T
AvailableAccessible
+ - -
• Time Travel is:
• Persistent – See next slide
• Precise – Following link to URI of Capture retrieves exact capture
• Seamless – Requires clicking a link as usual
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Travel in Time - Persistently, Precisely, Seamlessly
On-Demand Capture URI of Capture Original URI Datetime T
AvailableNot Accessible
+ - -
• Time Travel is:
• Persistent – Following link to URI of Capture leads nowhere
• Precise – Following link to URI of Capture leads nowhere
• Seamless – Following link to URI of Capture leads nowhere
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Travel in Time - Persistently, Precisely, Seamlessly
On-Demand Capture URI of Capture Original URI Datetime T
AvailableNot Accessible
+ + +
• Time Travel is:
• Persistent – Using Memento with [ Original URI ; Datetime ] works across web archives, versioning systems
• Precise – Using Memento with [ Original URI ; Datetime ] retrieves nearest capture from other archive
• Seamless – Requires browser plugin
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Travel in Time - Persistently, Precisely, Seamlessly
On-Demand Capture URI of Capture Original URI Datetime T
AvailableAccessible
- + +
• Time Travel is:
• Persistent – Using Memento with [ Original URI ; Datetime ] works across web archives, versioning systems
• Precise – Using Memento with [ Original URI ; Datetime ] retrieves exact capture from other archive
• Seamless – Requires browser plugin
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Travel in Time - Persistently, Precisely, Seamlessly
On-Demand Capture URI of Capture Original URI Datetime T
Not Available - + +
• Time Travel is:
• Persistent – Using Memento with [ Original URI ; Datetime ] works across web archives, versioning systems
• Precise – Using Memento with [ Original URI ; Datetime ] retrieves nearest capture from other archive
• Seamless – Requires browser plugin
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Reference Captures and Annotate Links
• Interoperability for time travel:o Memento protocol specifies interoperability across web
archives, version management systemso Memento protocol is supported by major web archiveso Need to work towards Memento support by version
management systemso Need to work towards making Memento experience
seamless through native browser supporto Need to work towards robustness and sustainability of
Memento infrastructure
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Conclusion
• Significant technical solutions, infrastructure, ideas exist to address the link rot/reference rot challenge
• Mobilize the communities that care about this challenge to work towards joint, interoperable approaches
Herbert Van de Sompel404/File Not Found, Washington, DC, October 24 2014
Creating Pockets of Persistence
http://mementoweb.org
http://hiberlink.org
Top Related