#watitis2014 watitis.uwaterloo.ca CAN YOUR WEB BROWSER KEEP A SECRET? Terry Labach.
#watitis2014 watitis.uwaterloo.ca @watitisconf ONTARIO LIBRARY RESEARCH CLOUD: BUILDING A...
-
Upload
wendy-marshall -
Category
Documents
-
view
214 -
download
0
Transcript of #watitis2014 watitis.uwaterloo.ca @watitisconf ONTARIO LIBRARY RESEARCH CLOUD: BUILDING A...
#watitis2014
w a t i t i s . u w a t e r l o o . c a@ w a t i t i s c o n f
ONTARIO L IBRARY RESEARCH CLOUD: BUILDING A PROVINCE-WIDE
RESEARCH CLOUD FOR ONTARIO’S ACADEMIC L IBRARIES
P a s c a l C a l a r c o , U n i v e r s i t y o f Wa t e r l o o L i b r a r y
A n d r e w M c A l o r u m , I n f o r m a t i o n S y s t e m s & Te c h n o l o g y
#watitis2014
AGENDA
• Problem we’re trying to solve - Pascal• Funding and project plan - Pascal• Technology overview – Andrew• Some likely use cases – Andrew• Next steps – Pascal• Q&A
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
LIBRARIES’ GROWING STORAGE NEEDS
• Digitized physical materials: books, journals, film, audio
Reformatting to conserve original eg. Acidic paper such as newspapers
Reformatting to increase access eg. Rare materials
Format migration to preserve content eg. 16mm film
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
LIBRARIES GROWING STORAGE NEEDS
• Born digital scholarly content for long term stewardship:
E-Theses and supplemental material
Scholarship: Working papers, Pre-prints, Open Access
Research data: numeric, geospatial, image, audio
Websites and digital ephemera of academic interest
Donated electronic materials for Special Collections• John English’s hard drives of personal email correspondence,
drafts and other materials
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OCUL STORAGE SURVEY (2013)
• 10 of 21 institutions responded; six >10k FTE, 4 smaller than 10k
• Preservation & Access Needs: 80%: digitized print content
80%: faculty publications
60%: donated digital content
50%: research data
50%: GIS data
40%: purchased digital resources
20%: corporate records
20%: E-Theses
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OCUL SURVEY: STORAGE NEEDS
• Current storage requirements: 100GB-30TB; total of respondents: 58.5 TB
• Expected storage needs, next 2-3 years:20% 100TB+
40% 10TB-100TB
20% >10TB
250TB total for all 10 institutions
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OCUL SURVEY: STORAGE PROVISIONING
• 80% partner with campus IT often/mostly• 60% provision in-house often/mostly• 40% provision with other partner libraries
often/mostly• 30% provision with commercial services
often/mostly
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OCUL STORAGE SURVEY: TOP FEATURES (2013)
• Large storage on demand• Low cost• Canadian-based hosting• Transparent pricing• Archival quality storage
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
CLOUD OPTIONS
• Amazon S3/Glacier: $500k/year for current 250TB SP content
$2000/TB per year, recurring
• DuraCloud: Amazon reseller, adding preservation & mgmt. tools
$1000-$1500/TB per year, recurring
• Private Cloud: OpenStack $280-$350/TB per year, amortized over three years
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
MTCU PROPOSAL AND PIF FUNDING
• 2013/2014: OCUL was awarded $1.2 million Productivity and Innovation Fund (PIF) funding for OLRC startup
• 50TB per founding partner institution• Triplestore preservation: content copies at
three different co-located nodes for redundancy, error correction
• Text mining portal for stored ScholarsPortal content
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OPENSTACK • An open source cloud
computing platform, primarily deployed as an Infrastructure-as-a-Service (IaaS) platform
• Swift – OpenStack object store, store and retrieve data via API
• Integrate OpenStack/Swift to Digital Repository architectures
• Develop Dropbox-like cloud storage web interface
#watitis2014
USE CASES
• Audience: Librarians, Faculty• Digital Preservation• Institutional and Personal Storage• Repositories• Research Data Management• Text mining large volumes of digital textual
content for research purposes
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
FEDORA COMMONS
Open source digital object repository, that is the underlying architecture behind Islandora, Hydra, and other digital asset management systems.
#watitis2014
DSPACEAn open source turnkey institutional repository software for building open access repositories for scholarly and published digital content.
#watitis2014
ARCHIVEMATICA
An open source digital preservation system designed to maintain standards-based, long term access to collections of digital objects.
#watitis2014
DATAVERSE
• An open source web application for publishing, citing, analyzing and preserving research data.
• Research data management focus
#watitis2014
TEXT MININGPortential uses by researchers in Digital Humanities:• Entity recognition• Parts of speech
analysis• Topic modeling• Network analysis• Visualization
#watitis2014
CURRENT STATUS & MILESTONES
• October 2014: integration with Archivematica• December 2014: integration with DataVerse• Q1 2015: Storage Nodes finalized;
installation of Waterloo/Guelph/Laurier node• March 2015: integration with Fedora
Commons• May 2015: Third Hackfest, Text Mining Portal• June 2015: integration with DSpace
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
THANKS! QUESTIONS?
• Pascal Calarco, uWaterloo [email protected] x38215
• Andrew McAlorum, [email protected] x31135
OLRC: Pascal Calarco & Andrew McAlorum