Phenocams hanigan-20140309

21
Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis. Ivan Hanigan and Marco Fahmi Australian SuperSite Network (ASN) and Long Term Ecological Research Network (LTERN) ACEAS Phenocam Workshop 2014-03-11 Ivan Hanigan and Marco Fahmi (ASN-LTERN) Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis. 2014-03-11 1 / 21

Transcript of Phenocams hanigan-20140309

Page 1: Phenocams hanigan-20140309

Addressing the Key Challenges of Storage,

Discoverability, Accessibility and Analysis.

Ivan Hanigan and Marco Fahmi

Australian SuperSite Network (ASN) andLong Term Ecological Research Network (LTERN)

ACEAS Phenocam Workshop

2014-03-11

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 1 / 21

Page 2: Phenocams hanigan-20140309

Topic

1 Introduction

2 What I want out of this workshop

3 Storage hosting of the data

4 Discoverability

5 Accessibility

6 Analysis

7 Conclusion

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 2 / 21

Page 3: Phenocams hanigan-20140309

Introduction

Four key challenges to working with large data collections:

Storage (Big Data, resilience to disasters, future proofing)

Discoverability (exposing metadata, indexing, standard schemas)

Accessibility (who is accessing what? Is it collaborative?)

Analysis (workflow and provenance)

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 3 / 21

Page 4: Phenocams hanigan-20140309

Phenocams

Managing phenocam data is an exemplar of these issues

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 4 / 21

Page 5: Phenocams hanigan-20140309

Topic

1 Introduction

2 What I want out of this workshop

3 Storage hosting of the data

4 Discoverability

5 Accessibility

6 Analysis

7 Conclusion

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 5 / 21

Page 6: Phenocams hanigan-20140309

What I want out of this workshop

My work as a Data Manager / Data Analyst at ASN and LTERN

Toward a better set of descriptions of the business requirementsfor each of these goals

Building systems that address these challenges.

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 6 / 21

Page 7: Phenocams hanigan-20140309

Topic

1 Introduction

2 What I want out of this workshop

3 Storage hosting of the data

4 Discoverability

5 Accessibility

6 Analysis

7 Conclusion

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 7 / 21

Page 8: Phenocams hanigan-20140309

The Data Deluge

“The next five years will produce more research data than hasbeen produced in all of previous human history.”The great data explosion April 29, 2009http://www.theaustralian.news.com.au/story/0,25197,25400306-12332,00.html

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 8 / 21

Page 9: Phenocams hanigan-20140309

Australian Research Cloud

IT Infrastructure available is unprecedentedOften cheap or free

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 9 / 21

Page 10: Phenocams hanigan-20140309

Storage hosting of the data

There are technical challenges of storing (as well asuploading/downloading) data.

Sustainability and future-proofing of the storage is a logisticalchallenge.

Questions arise such as should your store be the only location ofthe data or one of several mirrors?

Is storage “indefinitely” really possible?

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 10 / 21

Page 11: Phenocams hanigan-20140309

Topic

1 Introduction

2 What I want out of this workshop

3 Storage hosting of the data

4 Discoverability

5 Accessibility

6 Analysis

7 Conclusion

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 11 / 21

Page 12: Phenocams hanigan-20140309

Data and metadata standards

With the gathered expertise, it will be useful to advocate:

Conventions over Configuration

appropriate syntax and semantics for Phenocam data with

well considered conceptual frameworks for grouping datasets

appropriate compatibility/compliance with other standards.

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 12 / 21

Page 13: Phenocams hanigan-20140309

Topic

1 Introduction

2 What I want out of this workshop

3 Storage hosting of the data

4 Discoverability

5 Accessibility

6 Analysis

7 Conclusion

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 13 / 21

Page 14: Phenocams hanigan-20140309

Ownership, Sharing and Anonymous Re-use

There will be some contractual obligations about sharing andpublishing data (or not!) as well as a general inclination of thegroup of what/when to share.

There is also the appropriate licensing scheme governing this,embargoes and controlled access.

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 14 / 21

Page 15: Phenocams hanigan-20140309

Ethics and Trust

Trust is needed then by the data provider to allay concerns overthe re-use of data

Collaborative and respectful use should be expected of datausers.

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 15 / 21

Page 16: Phenocams hanigan-20140309

Topic

1 Introduction

2 What I want out of this workshop

3 Storage hosting of the data

4 Discoverability

5 Accessibility

6 Analysis

7 Conclusion

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 16 / 21

Page 17: Phenocams hanigan-20140309

Analysis

Workflow management because users will need to have tools ortech savvy to do something interesting and useful with the data.

The paradigm of “Bringing the Code to the Data” rather than“Taking the Data to the Code”

Uses remote supercomputers with very large storage andcompute capacity

However it often feels like to be able to access and use asupercomputer one needs to be as skillful as a “Super Scientist”

How to support ordinary users wanting “Super” analyses?

Provenance tracking of analysis outputs to ensure reproducibility

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 17 / 21

Page 18: Phenocams hanigan-20140309

Appropriate analysis

There is an implicit belief of ‘big data’ advocates that answersto difficult environmental questions can be found throughsharing data

But Ecology is inherently about understanding local patterns andprocesses, and often hard-won, field-based understanding isessential to help interpret the results of data analyses

There might be a need for support in study designs from thosefamiliar with the ecosystem

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 18 / 21

Page 19: Phenocams hanigan-20140309

Security against malicious mis-use

A data analysis server is geared to executing software code

Analyses may require custom code to be written, or installationof third-party software from unknown developers

There is a risk that such a Virtual Lab could be the victim of amalicious attack.

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 19 / 21

Page 20: Phenocams hanigan-20140309

Topic

1 Introduction

2 What I want out of this workshop

3 Storage hosting of the data

4 Discoverability

5 Accessibility

6 Analysis

7 Conclusion

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 20 / 21

Page 21: Phenocams hanigan-20140309

Conclusion

These challenges are not trivial

We suspect the answers to many of these challenges will rely onoutsourcing much of the hardware and software as possible

to shift the responsibility of upkeep and sustainability onsomeone else’s shoulders

and let the scientists focus on their science.

Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 21 / 21