Bionimbus - Northwestern CGI Workshop 4-21-2011
-
Upload
robert-grossman -
Category
Technology
-
view
32.984 -
download
1
Transcript of Bionimbus - Northwestern CGI Workshop 4-21-2011
![Page 1: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/1.jpg)
Bionimbus: A Cloud-Based Infrastructure for Managing,
Analyzing and Sharing Genomics Data
Robert GrossmanInstitute for Genomics & Systems Biology (IGSB)
Computation InstituteUniversity of Chicago
andOpen Cloud Consortium
April 21, 2011
![Page 2: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/2.jpg)
Background
![Page 3: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/3.jpg)
Growth of Genomic Data
1977
Sanger Sequencing
1995
Microarray technology
2005
454, Solexa sequencing
2001HGP
2003ENCODESequence
species
Sequence everythingSequence
environment
Genbank 10^5 10^8 10^10
2003GFS
2008Hadoop 2006
AWS
![Page 4: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/4.jpg)
Source: Lincoln Stein
![Page 5: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/5.jpg)
The Challenge is to Support Cubes of High Throughput Sequence Data
Perturb the environment
Different developmental stages
Each cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq, movie, etc. data set.
Different pathologies
![Page 6: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/6.jpg)
We Have a Problem
• More and more of your colleagues produce so much data that they cannot easily manage, move, analyze and share it.
• Centers and large projects build their own infrastructure.• Every else is on their own.
vs…
![Page 7: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/7.jpg)
Part 1. Using Bionimbus
www.bionimbus.org
![Page 8: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/8.jpg)
8
Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data.
![Page 9: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/9.jpg)
9
User
1.
2.
3.
Enabling a broad community to utilize genome research
Bionimbus Cloud Sequencing Partner
or Center
![Page 10: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/10.jpg)
Step 1. Prepare a Sample
![Page 11: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/11.jpg)
Step 2. Login to Bionimbus and get a Bionimbus Key.
![Page 12: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/12.jpg)
Step 3. Fedex your sample to CGI.
![Page 13: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/13.jpg)
Step 4. Login on to Bionimbus and view your data
![Page 14: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/14.jpg)
Step 5. Use Bionimbus to perform standard and custom pipelines.
Using the ability of Bionimbus to launch multiple virtual machines reduced this analysis from 25 days to 1 day.
![Page 15: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/15.jpg)
Bionimbus Private Cloud
UC
Bionimbus Community
Cloud
Bionimbus Private
Cloud XYAmazondbGaP
CGIInternalSequencers
Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc.
Step 2. Send sample tobe sequenced.
BID Generator
Step 3b. Returnvariant calls, CNV, annotation…
Step 4. Secure datarouting to appropriatecloud based upon BID.
Step 5. Cloud based analysis
using IGSB and 3rd party tools and applications.
Step 3a. Return rawreads.
![Page 16: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/16.jpg)
Part 2. Introduction to Clouds
![Page 17: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/17.jpg)
17
Clouds provide on-demand computing and storage resources at the scale and with the reliability of a data center.
Computer scientists were caught by surprise.
![Page 18: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/18.jpg)
What is a Cloud?
18
Software as a Service (SaaS)
![Page 19: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/19.jpg)
What Else a Cloud?
19
Infrastructure as a Service (IaaS)
Users get one or more virtual machines “on demand”
![Page 20: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/20.jpg)
Are There Other Types of Clouds?
20
Hadoop was developed for processing Internet scale data for ad targeting and related applications but is now used for processing genomics data and may other applications.
ad targeting
![Page 21: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/21.jpg)
What is a new about clouds?
21
![Page 22: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/22.jpg)
22
Scale is New
![Page 23: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/23.jpg)
Elastic, On-Demand Computing with Usage Based Pricing Is New
23
1 computer in a rack for 120 hours
120 computers in three racks for 1 hour
costs the same as
Data center scale computing often leverages virtualization technologies.
![Page 24: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/24.jpg)
Part 3. Some Bionimbus Cases
![Page 25: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/25.jpg)
Case Study: Public Datasets in Bionimbus
![Page 26: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/26.jpg)
![Page 27: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/27.jpg)
Case Study: ModENCODE
• Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).
• Bionimbus VMs were used for some of the integrative analysis.
• Bionimbus is used as a backup for the modENCODE DCC
![Page 28: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/28.jpg)
28
>300 ChIP datasets-Chromatin/RNA timecourse-CBP-PolII-Pho/silencers-HDACs-Insulators-TFs
Predictions537 silencers2,307 new promoters12,285 enhancers14,145 insulators
www.modencode.orgwww.cistrack.orgNegre et al. Nature 2011
![Page 29: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/29.jpg)
Case Study: IGSB
• All samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.
![Page 30: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/30.jpg)
30
Bionimbus Virtual Machine Releases Peak Calling MAT
MA2CPeakSeqMACSSPP
Quality Control
Various
Alignment & Genotyping
Bowtie
TopHatSamtoolsPicard
![Page 31: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/31.jpg)
Part 4
31
Data Centers for Science
![Page 32: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/32.jpg)
experimental science
simulation science
datascience
160930x
1670250x
197610x-100x
200410x-100x
![Page 33: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/33.jpg)
Astronomical dataBiological data (Bionimbus)
NSF-PIRE OSDC Data ChallengeEarth science data (& disaster relief)
Open Science Data Cloud
![Page 34: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/34.jpg)
The goal is to build a data center in Chicago for biological, scientific,
medical and health care data in 4 to 5 years.
![Page 35: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/35.jpg)
Part 5. More About Bionimbus
![Page 36: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/36.jpg)
Database Services
Analysis Pipelines & Re-analysis
Services
GWT-based Front End
Large Data Cloud Services
Data Ingestion Services
Elastic Cloud Services
Intercloud Services
![Page 37: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/37.jpg)
Database Services
Analysis Pipelines & Re-analysis
Services
GWT-based Front End
Large Data Cloud Services
Data Ingestion Services
Elastic Cloud Services
Intercloud Services
(Hadoop,Sector/Sphere)
(Eucalyptus,OpenStack)
(PostgreSQL)
(IDs, etc.)(UDT, replication)
![Page 38: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/38.jpg)
Bionimbus Deployment Options
Bionimbus Community Cloudwww.bionimbus.org
Bionimbus AMIs & Amazon hosted applications
Bionimbus Private Clouds
![Page 39: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/39.jpg)
1. Provide long term persistent storage services at the scale of a data center.
A successful cloud will…
3. High performance ingestion and transport of data.2. Provide
Compute services at the scale of a data center.
![Page 40: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/40.jpg)
6. Peer with private genomics clouds.
A successful cloud will…
5. Peer with public clouds.
4. Support the liberation of data.
![Page 41: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/41.jpg)
Bionimbus satisfies each of these six requirements.
![Page 42: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/42.jpg)
Bionimbus Road Map
Over the next 3 to 4 months, we will:• Launch Bionimbus (we are in a pre-launch)• Add Galaxy-based workflow to Bionimbus• Add secure routing of genomes• Add more public datasets• Add more pipelines
![Page 43: Bionimbus - Northwestern CGI Workshop 4-21-2011](https://reader035.fdocuments.in/reader035/viewer/2022081519/55d534fcbb61ebf5548b46ec/html5/thumbnails/43.jpg)
For More Informationwww.bionimbus.org