Collaborative Genomic Data Analyses in the Cloud

24
Collaborative Genomic Data Analyses in the Cloud Steven B. Roberts Associate Professor School of Aquatic and Fishery Sciences University of Washington robertslab.info

description

Talk from eScience in the Cloud 2014

Transcript of Collaborative Genomic Data Analyses in the Cloud

Page 1: Collaborative Genomic Data Analyses in the Cloud

Collaborative Genomic DataAnalyses in the Cloud

Steven B. RobertsAssociate Professor

School of Aquatic and Fishery SciencesUniversity of Washington

robertslab.info

Page 2: Collaborative Genomic Data Analyses in the Cloud

Open Science

• You are free to Share!

• Our lab practices open notebookscience

robertslab.info [email protected]

Page 3: Collaborative Genomic Data Analyses in the Cloud
Page 4: Collaborative Genomic Data Analyses in the Cloud
Page 5: Collaborative Genomic Data Analyses in the Cloud
Page 6: Collaborative Genomic Data Analyses in the Cloud

Stochastic Variation

10.1093/bfgp/elt05410.6084/m9.figshare.880763

Page 7: Collaborative Genomic Data Analyses in the Cloud
Page 8: Collaborative Genomic Data Analyses in the Cloud

raw - 70G mapping - 60G tables - 40G ........

big,big,

computeintensive,education

Why cloud?

Page 9: Collaborative Genomic Data Analyses in the Cloud

}

Page 10: Collaborative Genomic Data Analyses in the Cloud
Page 11: Collaborative Genomic Data Analyses in the Cloud
Page 12: Collaborative Genomic Data Analyses in the Cloud

Use Cases• Joining on Annotations• File Conversion• Querying Gene Tables

Page 13: Collaborative Genomic Data Analyses in the Cloud
Page 14: Collaborative Genomic Data Analyses in the Cloud

github.com/sr320/qdod/wiki

Page 15: Collaborative Genomic Data Analyses in the Cloud

Sharing Collaboration*

Page 16: Collaborative Genomic Data Analyses in the Cloud

ReproducibleOpenCollaboration

Open Notebook Science

Page 17: Collaborative Genomic Data Analyses in the Cloud

Open Notebook Science

... there is a URL to a laboratory notebook that is freely available and indexed on common search engines. It does not necessarily have to look like a paper notebook but it is essential that all of the information available to the researchers to make their conclusions is equally available to the rest of the world.

—Jean-Claude Bradley

Page 18: Collaborative Genomic Data Analyses in the Cloud

Open Notebook Science

genefish.wikispaces.com

Page 19: Collaborative Genomic Data Analyses in the Cloud

Open Notebook Science

Set some variables

blast

convert file format

upload to SQLShare (python client)

join in SQLShare - download

read in pandas

matplotlib generates graph of GOsllim

Page 20: Collaborative Genomic Data Analyses in the Cloud

Open Notebook Science

Wiki - collaboration, versioning, search, publishing

Evernote - simple, multi-platform

IPython - executable, versioning*

Comparison

no perfection solution

Page 21: Collaborative Genomic Data Analyses in the Cloud

Challenges:versioning, provenance, collaboration, simple sharing, discoverability

Page 22: Collaborative Genomic Data Analyses in the Cloud

ReproducibleScienceOpen

Page 23: Collaborative Genomic Data Analyses in the Cloud

Acknowledgements

Mackenzie GaveryClaire Olson

Sam WhiteBrent VadopalasJake Heare

Bill HoweDan Halperin

EPASTAR

Aquaculture Program

DNA methylation

Page 24: Collaborative Genomic Data Analyses in the Cloud