A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

28
A Durable Space Technologies for Accessing our Collective Digital Heritage David Wilcox Fedora Product Manager DuraSpace

Transcript of A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Page 1: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

A Durable SpaceTechnologies for Accessing our Collective Digital Heritage

David Wilcox!Fedora Product Manager!

DuraSpace

Page 2: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

A place for my stuff

• What is a repository?!• A repository provides long-term storage and

preservation of digital content!• It also provides long-term access in the form of

persistent URLs

Page 3: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

The repository landscape

• Many repository software packages are designed to support institutional repositories!• Examples include: DSpace, EPrints, and Digital

Commons!• These solutions tend to be easy to setup and

manage but offer limited customization

Page 4: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

What is Fedora?

• Flexible Extensible Durable Object Repository Architecture!• Fedora is OPEN SOURCE digital repository software!• It is developed, adopted, and supported

internationally!• We’re building Fedora 4!

• The entire codebase has been re-written to support the needs for robust and full-featured repository services for the next decade

Page 5: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Fedora integrates with other applications

• Fedora typically sits between a back-end file system and a front-end user interface!• A wide variety of back-end file systems are

supported!• Popular front-end user interfaces include Hydra

and Islandora

Page 6: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

How big is the community?

• There are over 320 Fedora installations (that we know of!)!

• Our community is growing:!• 41 Fedora sponsors!• 19 active developers!• 17 leadership group members!• 10 steering group members

Page 7: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

• We are an independent 501(c)(3) non-profit!• We provide leadership and support for:!

• Fedora Commons!• DSpace!• VIVO!

• We also provide software services!!• DuraCloud!• DSpaceDirect

Page 8: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Sources of funding2011

6%

21%

42%

31%

Moore GrantOther GrantsSponsorshipServices

2014

33%

32%

35%

Page 9: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Sponsorship in on the rise!

$0

$125,000

$250,000

$375,000

$500,000

2011 2012 2013 2014

Page 10: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Building toward sustainability

• We’re broadening the funding base!• More sponsors at lower funding amounts!• Raising the overall level of funding!

• We’re hiring!!• Specifically, I was hired as the Product Manager!• Andrew Woods is the Tech Lead!

• We’ve established a governance model!• Fedora is governed by a Leadership group and a

Steering group

Page 11: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

How does Fedora support research data?

Page 12: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Model your data however you want

• Research data can take many forms!• Fedora is flexible enough to store and preserve

any file type!

• Your data may also be inter-related in complex ways!• No problem! Fedora provides native RDF

support so you can relate things however you want

Page 13: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Metadata Spreadsheet

Audio Video

Metadata PDF

Metadata PDF

Publication 1 Publication 2

Research Data

Page 14: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Metadata Spreadsheet

Metadata PDF

Metadata PDF

Publication 1 Publication 2

Research Data 1

Metadata Video

Metadata Audio

Research Data 2 Research Data 3

Page 15: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Store and manage large files

• Research data comes in all shapes and sizes!• Huge datasets in spreadsheets!• High resolution images!• High-quality audio/video recordings!

• Fortunately, Fedora supports files of virtually any size

Page 16: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Manage external files

• Your research data may be stored in an external file system!• Fedora can project over these files and treat them

as if they were in the repository!• Fedora’s management and preservation features

will be available to these files

Page 17: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Preserve your data

• Fedora provides a variety of preservation features!• Automated fixity checking (checksums)!• Backup and restore of the entire repository!

• You can also create new versions every time you make a change:!• Across the repository!• Only for certain actions

Page 18: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Research data use cases

Page 19: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Institutional repositories

• Repository managers need to upload publications and associated research datasets!• Publications and research data files can uploaded

with associated metadata!• Each file can be associated with any number of

other files in the repository

Page 20: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Managing publications

Page 21: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Associating research data

Page 22: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Smithsonian Institute

• SIdora is designed to support the research process from beginning to end!• Researchers upload files to the repository and use

it directly in their analysis!• The entire research process is documented and

preserved alongside the finished publication

Page 23: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Managing concepts

Page 24: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Adding resources

Page 25: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Editing metadata

Page 26: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Uploading datasets

Page 27: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Useful links

• Fedora 4 wiki!• https://wiki.duraspace.org/display/FF!

• Fedora community mailing list!• https://groups.google.com/forum/#!forum/fedora-

community!

• Fedora developers mailing list!• https://groups.google.com/forum/#!forum/fedora-

tech

Page 28: A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data Deluge

Questions?