Phenocams hanigan-20140309
-
Upload
aceas13tern -
Category
Education
-
view
133 -
download
0
Transcript of Phenocams hanigan-20140309
Addressing the Key Challenges of Storage,
Discoverability, Accessibility and Analysis.
Ivan Hanigan and Marco Fahmi
Australian SuperSite Network (ASN) andLong Term Ecological Research Network (LTERN)
ACEAS Phenocam Workshop
2014-03-11
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 1 / 21
Topic
1 Introduction
2 What I want out of this workshop
3 Storage hosting of the data
4 Discoverability
5 Accessibility
6 Analysis
7 Conclusion
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 2 / 21
Introduction
Four key challenges to working with large data collections:
Storage (Big Data, resilience to disasters, future proofing)
Discoverability (exposing metadata, indexing, standard schemas)
Accessibility (who is accessing what? Is it collaborative?)
Analysis (workflow and provenance)
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 3 / 21
Phenocams
Managing phenocam data is an exemplar of these issues
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 4 / 21
Topic
1 Introduction
2 What I want out of this workshop
3 Storage hosting of the data
4 Discoverability
5 Accessibility
6 Analysis
7 Conclusion
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 5 / 21
What I want out of this workshop
My work as a Data Manager / Data Analyst at ASN and LTERN
Toward a better set of descriptions of the business requirementsfor each of these goals
Building systems that address these challenges.
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 6 / 21
Topic
1 Introduction
2 What I want out of this workshop
3 Storage hosting of the data
4 Discoverability
5 Accessibility
6 Analysis
7 Conclusion
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 7 / 21
The Data Deluge
“The next five years will produce more research data than hasbeen produced in all of previous human history.”The great data explosion April 29, 2009http://www.theaustralian.news.com.au/story/0,25197,25400306-12332,00.html
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 8 / 21
Australian Research Cloud
IT Infrastructure available is unprecedentedOften cheap or free
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 9 / 21
Storage hosting of the data
There are technical challenges of storing (as well asuploading/downloading) data.
Sustainability and future-proofing of the storage is a logisticalchallenge.
Questions arise such as should your store be the only location ofthe data or one of several mirrors?
Is storage “indefinitely” really possible?
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 10 / 21
Topic
1 Introduction
2 What I want out of this workshop
3 Storage hosting of the data
4 Discoverability
5 Accessibility
6 Analysis
7 Conclusion
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 11 / 21
Data and metadata standards
With the gathered expertise, it will be useful to advocate:
Conventions over Configuration
appropriate syntax and semantics for Phenocam data with
well considered conceptual frameworks for grouping datasets
appropriate compatibility/compliance with other standards.
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 12 / 21
Topic
1 Introduction
2 What I want out of this workshop
3 Storage hosting of the data
4 Discoverability
5 Accessibility
6 Analysis
7 Conclusion
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 13 / 21
Ownership, Sharing and Anonymous Re-use
There will be some contractual obligations about sharing andpublishing data (or not!) as well as a general inclination of thegroup of what/when to share.
There is also the appropriate licensing scheme governing this,embargoes and controlled access.
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 14 / 21
Ethics and Trust
Trust is needed then by the data provider to allay concerns overthe re-use of data
Collaborative and respectful use should be expected of datausers.
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 15 / 21
Topic
1 Introduction
2 What I want out of this workshop
3 Storage hosting of the data
4 Discoverability
5 Accessibility
6 Analysis
7 Conclusion
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 16 / 21
Analysis
Workflow management because users will need to have tools ortech savvy to do something interesting and useful with the data.
The paradigm of “Bringing the Code to the Data” rather than“Taking the Data to the Code”
Uses remote supercomputers with very large storage andcompute capacity
However it often feels like to be able to access and use asupercomputer one needs to be as skillful as a “Super Scientist”
How to support ordinary users wanting “Super” analyses?
Provenance tracking of analysis outputs to ensure reproducibility
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 17 / 21
Appropriate analysis
There is an implicit belief of ‘big data’ advocates that answersto difficult environmental questions can be found throughsharing data
But Ecology is inherently about understanding local patterns andprocesses, and often hard-won, field-based understanding isessential to help interpret the results of data analyses
There might be a need for support in study designs from thosefamiliar with the ecosystem
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 18 / 21
Security against malicious mis-use
A data analysis server is geared to executing software code
Analyses may require custom code to be written, or installationof third-party software from unknown developers
There is a risk that such a Virtual Lab could be the victim of amalicious attack.
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 19 / 21
Topic
1 Introduction
2 What I want out of this workshop
3 Storage hosting of the data
4 Discoverability
5 Accessibility
6 Analysis
7 Conclusion
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 20 / 21
Conclusion
These challenges are not trivial
We suspect the answers to many of these challenges will rely onoutsourcing much of the hardware and software as possible
to shift the responsibility of upkeep and sustainability onsomeone else’s shoulders
and let the scientists focus on their science.
Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 21 / 21