Download - Collaborative Genomic Data Analyses in the Cloud

Transcript
Page 1: Collaborative Genomic Data Analyses in the Cloud

Collaborative Genomic DataAnalyses in the Cloud

Steven B. RobertsAssociate Professor

School of Aquatic and Fishery SciencesUniversity of Washington

robertslab.info

Page 2: Collaborative Genomic Data Analyses in the Cloud

Open Science

• You are free to Share!

• Our lab practices open notebookscience

robertslab.info [email protected]

Page 3: Collaborative Genomic Data Analyses in the Cloud
Page 4: Collaborative Genomic Data Analyses in the Cloud
Page 5: Collaborative Genomic Data Analyses in the Cloud
Page 6: Collaborative Genomic Data Analyses in the Cloud

Stochastic Variation

10.1093/bfgp/elt05410.6084/m9.figshare.880763

Page 7: Collaborative Genomic Data Analyses in the Cloud
Page 8: Collaborative Genomic Data Analyses in the Cloud

raw - 70G mapping - 60G tables - 40G ........

big,big,

computeintensive,education

Why cloud?

Page 9: Collaborative Genomic Data Analyses in the Cloud

}

Page 10: Collaborative Genomic Data Analyses in the Cloud
Page 11: Collaborative Genomic Data Analyses in the Cloud
Page 12: Collaborative Genomic Data Analyses in the Cloud

Use Cases• Joining on Annotations• File Conversion• Querying Gene Tables

Page 13: Collaborative Genomic Data Analyses in the Cloud
Page 14: Collaborative Genomic Data Analyses in the Cloud

github.com/sr320/qdod/wiki

Page 15: Collaborative Genomic Data Analyses in the Cloud

Sharing Collaboration*

Page 16: Collaborative Genomic Data Analyses in the Cloud

ReproducibleOpenCollaboration

Open Notebook Science

Page 17: Collaborative Genomic Data Analyses in the Cloud

Open Notebook Science

... there is a URL to a laboratory notebook that is freely available and indexed on common search engines. It does not necessarily have to look like a paper notebook but it is essential that all of the information available to the researchers to make their conclusions is equally available to the rest of the world.

—Jean-Claude Bradley

Page 18: Collaborative Genomic Data Analyses in the Cloud

Open Notebook Science

genefish.wikispaces.com

Page 19: Collaborative Genomic Data Analyses in the Cloud

Open Notebook Science

Set some variables

blast

convert file format

upload to SQLShare (python client)

join in SQLShare - download

read in pandas

matplotlib generates graph of GOsllim

Page 20: Collaborative Genomic Data Analyses in the Cloud

Open Notebook Science

Wiki - collaboration, versioning, search, publishing

Evernote - simple, multi-platform

IPython - executable, versioning*

Comparison

no perfection solution

Page 21: Collaborative Genomic Data Analyses in the Cloud

Challenges:versioning, provenance, collaboration, simple sharing, discoverability

Page 22: Collaborative Genomic Data Analyses in the Cloud

ReproducibleScienceOpen

Page 23: Collaborative Genomic Data Analyses in the Cloud

Acknowledgements

Mackenzie GaveryClaire Olson

Sam WhiteBrent VadopalasJake Heare

Bill HoweDan Halperin

EPASTAR

Aquaculture Program

DNA methylation

Page 24: Collaborative Genomic Data Analyses in the Cloud