Development of the CyberCemetery (2011)
-
Upload
dr-starr-hoffman -
Category
Education
-
view
139 -
download
0
description
Transcript of Development of the CyberCemetery (2011)
Development & Practice in the CyberCemetery
Starr HoffmanHead, Government Documents Dept.
University of North Texas Libraries25 September 2011
• Intro What is the CyberCemetery?• Purpose Why create a
CyberCemetery?• Development• Archiving Process• Technical Details• User Demographics Who uses the
CyberCemetery?• Conclusion
http://digital.library.unt.edu/explore/collections/GDCC/
• online archive of websites from U.S. government agencies or commissions that are no longer operating
http://digital.library.unt.edu/explore/collections/GDCC/
• online archive of websites from U.S. government agencies or commissions that are no longer operating
• “snapshot” of each website as it existed before “pulling the plug”
• maintained by the University of North Texas Libraries
• freely accessible world-wide
• affiliated NARA archive (National Archives and Records Administration)
http://digital.library.unt.edu/explore/collections/GDCC/
1997 - present 2008 - present
• Protect At-Risk Information:• 1990’s: U.S. government information = online• born-digital• edited or removed without warning
• Federal Depository Library Program (FDLP)• administered by U.S. Government Printing Office (GPO)• mission: to provide free, permanent public access to
government information• online information complicates this mission• University of North Texas is a federal depository library
1995 e-docs at risk
Government Printing Office
(GPO) publishes
report stating need to preserve electronic
government publications
1997 GPO + UNT
University of North Texas (UNT) talks
to GPO about
forming a partnership
1997
ACIR archive
d
UNT archives website of the
Advisory Commission
on Intergovernm
ental Relations
(ACIR)
1999 GPO + UNT =
expanded
permanent public access, expanded to
multiple websites, & any
agency or commission no
longer operating
1999 CyberCemeter
y
archive is named
“CyberCemetery” because
websites are from “dead” agencies &
commissions
2006
GPO + UNT + NARA
partnership now includes
the U.S. National
Archives and Records
Administration (NARA)
2011
73+ websites archived
1. Identify at-risk government agencies and commissions
• contacted directly by agency/commission• contacted by GPO • read/listen to news • read government-related websites & blogs• targeted search-engine queries
• (“final report” + .gov)• referrals from other librarians, patrons
2. Evaluate the website• must be an official government website• the agency or commission must:
• be closing• issued a final report• other indication that the website is at-risk
2. Evaluate the website (continued) Questions for website administrator:
What operating system was used to host this website? What webserver software was used for the hosting of this website? Are server side includes (ssi) used in this website? Was this website static html or a dynamic site?
If dynamic, what scripting languages were used for this website (php, perl, python)?
Was a database used for this website?2. If so, what database was used for this website?3. What methods were used to connect to the database?
Is there streaming media associated with this website? Are there proprietary content types used in this website? Are there any comments you would like to add?
3. Harvest the website• software: Heritrix (from Internet Archive)
• http://crawler.archive.org/ • downloads content• bundles all content into WARC file• WARC = website in a single file• no manipulation of code or content
4. Access archived website• software: Wayback (from Internet Archive)
• http://archive-access.sourceforge.net/projects/wayback/ • retrieves content from WARC• add banner notifying archived status
5. Harvesting alternative: Donated content• directly receive files from agency or commission
• Why not donated content?• Content could be altered • Harvesting = exact copy of online published content
• Why donated content?• If content cannot be accessed by harvesting • flash video, large amounts of media• rarely necessary now
6. Link Checking• Manual:
• manually navigate original & archived sites• Automated:
• Xenu Link Checker• http://home.snafu.de/tilman/xenulink.html• compare reports of original & archived sites
7. Load to UNT Server• Upload archived website• Add navigation • Notify GPO (or agency/commission) that archived
version is live
• Backup• full backups to magnetic tape• performed each weekend• shipped to offsite storage company
• Iron Mountain • http://www.ironmountain.com
• web files (HTML, XML)• text documents
(.txt, .pdf, .doc)• spreadsheets & statistics
(.xls)• presentations (.ppt)• media files:
• images & photographs (.jpg, .gif, .png, .tiff)
• audio (.mp3)• video (.wm, .mov, .rp)
• researchers• historians• students• government employees• general public
• avg. +1,000,000 hits per month
• peak visits in one day:• 9,996 on 11.03.2011
• most popular site: 9/11 Commission
• provides permanent public access• archive of “dead” government information• freely, globally available• 73 websites and growing
• partnership between:• University of North Texas Libraries• U.S. Government Printing Office• National Archives and Records Administration
FOR FURTHER INFORMATION:
http://www.library.unt.edu/govinfo/ http://digital.library.unt.edu/explore/collections/GDCC/
Starr HoffmanHead, Government Documents Dept.University of North Texas [email protected]
[email protected] http://geekyartistlibrarian.com