Micah AltmanAssociate Director, Harvard-MIT Data Center
Institute for Quantitative Social Science, Harvard University
Bryan BeecherDirector of Computing and Network Services
Inter-university Consortium of Political and Social Research, University of Michigan
Marc MaynardDirector of Technical Services
The Roper Center for Public Opinion Research, University of Connecticut
Jonathan CrabtreeAssistant Director for Archives and Information Technology
HW Odum Institute for Research in Social Science, University of North Carolina
CNI 2008 Fall Task Force Meeting 1
Our StoryWho are you guys?What problem are you trying to solve?What have you done?Why do we care?
CNI 2008 Fall Task Force Meeting 2
Data-PASS• Partnership devoted to identifying, acquiring and preserving data at-risk of being lost to the social science research community
• Partners– ICPSR– Odum Institute– Harvard MIT Data Center
– Roper Center– National Archives
CNI 2008 Fall Task Force Meeting 3
http://flickr.com/photos/phauly/35555985/
Data-PASS
CNI 2008 Fall Task Force Meeting 4
Data-PASSLots of little files (social science data)ASCII data filesPDF technical documentation (codebooks)Millions of ‘em
Archival storageWas tapeNow disk
CNI 2008 Fall Task Force Meeting 5
Before
CNI 2008 Fall Task Force Meeting 6
After
CNI 2008 Fall Task Force Meeting 7
Archival storage?
CNI 2008 Fall Task Force Meeting 8
http://failblog.org/2008/02/08/floppy-fail/
Archival storage?Remote disksGridsCloudsWith partners?
CNI 2008 Fall Task Force Meeting 9
Why roll your own?Policy-drivenAuditableAsymmetricIndependence of each location
CNI 2008 Fall Task Force Meeting 10
Syndicated Storage Platform (SSP)Start with LOCKSSLots of Copies Keep Stuff SafeBut used in a closed network
Private LOCKSS Network (PLN)A few of them out there
MetaArchive perhaps the best known
Biggest selling point was independence of each node in the PLN
CNI 2008 Fall Task Force Meeting 11
PLNsLOCKSS is really easy to setup
PLNs are more difficultOther differences between traditional PLN and our needsOur content isn’t harvestable via HTTPOur PLN nodes are different sizesOur trust model requirement prevents a centralized authority controlling the network
CNI 2008 Fall Task Force Meeting 12
SSP = Stone Soup Platform?ICPSR and Odum setup a small PLNHDMC provided a harvester and designed the schema
Odum built the ComparatorRoper is building the Invitor
CNI 2008 Fall Task Force Meeting 13
PLN
CNI 2008 Fall Task Force Meeting 14
Schema• Nodes
– IP address– Storage commitment
• AUs– Max size– # in the PLN
• Lots more
CNI 2008 Fall Task Force Meeting 15
Comparator• diff for our SSP• Compares
– Contents of the LOCKSS Cache Manager [sic] – Schema
• Produces– List of differences between “what is” and “what should be”
– Feeds into another tool for “fixing the PLN”
• Machine-actionable output (XML)
CNI 2008 Fall Task Force Meeting 16
Invitor• Reads the report from the Comparator• Issues requests to PLN nodes to ADD or DROP an AU– Expectation is that PLN nodes always accept an ADD if they can• An offer they cannot refuse
• Requests may be reviewed/approved by a human administrator (or not)
• USENET news technology?
CNI 2008 Fall Task Force Meeting 17
SummaryData-PASS is a group of archives committed to preserving social science data
Exploring various technology optionsOne avenue is a custom LOCKSS deploymentNetwork schemaOAI data harvesterComparison toolNetwork update tool
CNI 2008 Fall Task Force Meeting 18
Top Related