Easygenomics ISCB Cloud section 2012
Transcript of Easygenomics ISCB Cloud section 2012
![Page 1: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/1.jpg)
Contact [email protected]
http://www.easygenomics.com
Next Generation Bioinformaticson the Cloud
Xing Xu, Ph.DDirector of Cloud Computing Product
![Page 2: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/2.jpg)
Topics for Today
Behind the cloud product- BGI- The team
The product: EasyGenomics- Why are we building this product?- What can this product do?
Future direction and open questions
2
![Page 3: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/3.jpg)
BGI
The world largest genome sequencing center- Started with Human Genome Project in 1999 with only a
few sequencers.- Now more than 150 sequencers, 6 TB/day sequencing
throughput.
MODEL ABI3730XL
Roche454
ABISOLiD 4
SolexaGA IIx
IlluminaHiSeq 2000
INSTALLATION 16 1 27 6 135
![Page 4: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/4.jpg)
BGI
The world largest genome sequencing center The largest computing and storage center for
genomics in China
- 20,000+ CPU cores- 19 NVIDIA GPUs- 220+ Tflops peak
performance- 17 PB data storage- The storage and computation
capability increase by 10000 folds!
- Still increasing …
![Page 5: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/5.jpg)
BGI
The world largest genome sequencing center The largest computing and storage center for
genomics in China One of world leading research institutes in
Genomics
Since 2007, - 253 papers in high-impact journals- Including 47 in Nature and its sub-
journals, 9 in Science, 2 in Cell, and 1 in NEJM, with 42 first and/or corresponding authors
- 369 patent applications- 254 software authorship
![Page 6: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/6.jpg)
BGI
The world largest genome sequencing center The largest computing and storage center for
genomics in China One of world leading research institutes in
Genomics
BGI has the sequencing capacity, hardware resource and software proficiency to be the one of the strongest end-to-end service providers in the world for NGS sequencing, data analysis and data interpretation.
![Page 7: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/7.jpg)
Team for the Cloud Platform
Run like a software company
Managers are from leading software companies, such as HP, Microsoft, and Levono.
Team members are Young, Energetic, and Ambitious.
Fully supported by BGI in-house algorithm development teams.
Product
Development
Testing
Operation
BGI Support
![Page 8: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/8.jpg)
Team for the Cloud Platform
Development Team- Dev: Ming Jiang, Yongsheng Chen, Can Long, Jiasheng Wu, etc.- Flex Lab: Yan Li, Shengchang Gu etc. GPU Lab: Bingqiang Wang etc.- Pipeline: Liang Wang etc.
Test & QA Team- Xin Guan, Jingjuan Liu, etc.
PMO & IT Operation- Wenjun Zeng, Litong Lai, Jing Tian, etc.
Product Team- Xing Xu, Jing Guo, Fang Fang etc.
Other BGI Teams
+ + +
![Page 9: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/9.jpg)
Topics for Today
Behind the cloud product- BGI- The team
The product: EasyGenomics- Why are we building this product?- What can this product do?
Future direction and open questions
9
![Page 10: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/10.jpg)
Trend of Volume and Cost
10
![Page 11: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/11.jpg)
Geological side of the problem
Sequencing happens EVERYWHERE.
+
Geological side of the problem
Images from omicsmaps.com
BGI
![Page 12: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/12.jpg)
Difficulties of Analysis
In-depth Annotation
Lack of knowledge
Post Tertiary Analysis
Variant Calling
Complicated AlgorithmsComputation intensive
Tertiary Analysis
Mapping
Computation intensiveData storage
Secondary Analysis
Base calling
Data throughputData storage
Primary analysis
![Page 13: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/13.jpg)
Problems and Solutions
13
Problems:
• Big genomic data• Geological distribution
• Algorithm integration
• Computational demand
• Big genomic data• Geological distribution
• Algorithm integration
• Computational demand+)
CloudHigh Speed Data Exchange
Pipelines
Distributed Workloads
Solutions
![Page 14: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/14.jpg)
EasyGenomics™
EasyGenomics is a Software as a Service (SaaS) bioinformatics platform for research and applications.
Algorithms, Workflows,
Reports
Computational ResourcesDatabase,
Data management
Web portal,Simple UIHigh speed
connection
![Page 15: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/15.jpg)
Bioinformatics Workflows
Data Management
High Speed Connection
Key Features
![Page 16: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/16.jpg)
Bioinformatics Workflow
Four steps: Upload, Create a Sample, Perform Analyses, Download Results
Algorithms: Carefully chosen, tested and optimized
Workflows: Whole Genome Resequencing, Exome Resequencing, RNA-Seq, small RNA, ncRNA, and De novo Assembly
![Page 17: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/17.jpg)
Homepage
Four task portals
Status of recent works
Warning and Logging
Navigation Tabs
![Page 18: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/18.jpg)
Bioinformatics Workflow--- Pipelines
18
Exome Resequencing RNASeq
Transcriptome
![Page 19: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/19.jpg)
Bioinformatics Workflow---Comprehensive Reports
19
![Page 20: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/20.jpg)
Bioinformatics Workflow---Comprehensive Reports
20
![Page 21: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/21.jpg)
Data Management
“Sample”, “Analysis”, “Project” Mimicking real research procedure Automatic management of underlying data structure
Raw Data
Sample A
Sample B
Analysis I
Analysis II
Analysis XProject I
![Page 22: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/22.jpg)
Create a Sample
Add read groups
![Page 23: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/23.jpg)
Sample Page
Individual report for each lane
Summarized report for all lanes
![Page 24: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/24.jpg)
Data management---Security
Access
Multi-tenancy
Isolation
Compliance
• Username/Password• Biometric access• HTTPS , Aspera fastpTM
• Trusted database connection
• ACL, Data encryption
• Physical isolation• Virtual isolation
• ISO27000
![Page 25: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/25.jpg)
High Speed Data Exchange
Aspera’s patented fasp™ high-speed file transferring technology
10~100X faster than FTP
25
![Page 26: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/26.jpg)
Transfer 24GB in 30 Seconds
26
Demonstrated 10Gbps ultra high speed data exchange with UC Davis, and NCBI in June.
![Page 27: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/27.jpg)
Transfer 24GB in 30 Seconds
27
Demonstrated 10Gbps ultra high speed data exchange with UC Davis, and NCBI in June.
A 24GB file was transferred from China to US in 30 Seconds (~8Gbits/s).
![Page 28: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/28.jpg)
Amount of Data that can be transferred in 24hr
28
![Page 29: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/29.jpg)
Easy-to-Use UI
Reusability- Reuse the same sample for different analyses (different
parameters)- Reuse all parameter settings for different analyses
Simple UI and interactive features- As easy as to do online shopping- Shortcut for predefined setting, at the same time fully
customizable for advance users- Handle batch analyses in one setting
29
![Page 30: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/30.jpg)
Create an Analysis
Selected sample(s)
• One selected sample => Single Analysis• Multiple selected samples => Batch Analyses
![Page 31: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/31.jpg)
Create an Analysis
Selectable modules
Predefined Settings
Shortcut
![Page 32: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/32.jpg)
Create an Analysis
![Page 33: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/33.jpg)
Create an Analysis
Customizable
![Page 34: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/34.jpg)
Create an Analysis
![Page 35: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/35.jpg)
Project TableAdd/Remove
Project
Operation short cuts
Project list table Filter and search box
![Page 36: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/36.jpg)
Analysis Table
![Page 37: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/37.jpg)
Sample Table
![Page 38: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/38.jpg)
A typical user case
38
![Page 39: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/39.jpg)
Topics for Today
Behind the cloud product- BGI- The team
The product: EasyGenomics- Why are we building this product?- What can this product do?
Future direction and open questions
39
![Page 40: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/40.jpg)
Future directions
What is the market? Which direction to go?
- Cloud on the public infrastructure vs cloud on the private infrastructure
- SaaS vs PaaS- Data analysis is only one step of the whole process.- What will be the sustained model for the cloud service?
![Page 41: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/41.jpg)
Cloud Service Providers
Market Position
Annotation Providers
Sequencing Service ProvidersInstrument Manufacturers
Personal Genetic TestingProviders
illumina
Software Providers
NOW
![Page 42: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/42.jpg)
Challenge and Solution
DNANexus Basespace(Illumina)
GenomeSpace EasyGenomics Ingenuity/ NextBio
Cloud Public Public Public Private PrivateReasoning Great demand on
space and computation resources
Security, Privacy issue
Positioning Infrastructure (PaaS)
App Store Platform for accessing available tools.
SaaS Solution InformationThey are playing the results from NGS not the raw reads.
Advantage Funding Advance in the
field
Sequencing service Community of
Partners
Strong connection to academia
Sequencing Service Development
Capability
Experience
42
![Page 43: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/43.jpg)
Public vs Private Cloud
Public CloudPros:
− “Limitless” resource− Share data to a wide
range of people− Offering nice platform
Cons:− Security and reliability− Short term cost saving
vs Long term cost nightmare
Private CloudPros:
− Flexibility− Security and Privacy
control− Long-term cost saving
Cons:− Big initial investment− Maintaining the
infrastructure and software on the cloud
But, the line between public and private cloud are blurring.
![Page 44: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/44.jpg)
A sustained model for cloud service?
Key components of cost- Storage- Computational resource- Data transfer- Software usage
App store or Cell phone plan
Long term cost vs Short term cost
![Page 45: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/45.jpg)
Data analysis is NOT ALL!
EPM
Project Management Sample Center Wet Lab
OperationBioinformatics Data Analysis
EPM
Management System
Budgeting
Tasking
Receipt/Storage
Handover
Sample QC
Sample prep
Workflow
Sequencing
Data analysis
Data QC
Sale
s
Bill
ing
Web-based Interface
Management Interfacing Query Statistics
![Page 46: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/46.jpg)
Roadmap of EasyGenomics
46
Jun 2012
Aug 2012
Sep 2012
Dec 2012
Apr 2013
EG1.1 (in Jun)• New result reports• Fully Integrated Data
Exchange Interface
EG1.2 (in Aug)• New read filtering step,
speed up 20x
EG1.3 (in Sep)• Data import from BGI
sequencing service
EG1.5 (est. in Dec)• QC indicator, QC module• New Sample report• Transcriptome workflows• Reference management
EG2.0 (est. in Apr, 2013)• IRODs data management• Data sharing, collaboration• User own applications• Comparison, Filtering tools• Visualization
![Page 47: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/47.jpg)
www.EasyGenomics.com
Free Beta Trial is on going!!
![Page 48: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/48.jpg)
Interpretation is the KEY
Analysis and Interpretation is the KEY
![Page 49: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/49.jpg)
Enabling Technology
49
Best Practice Award for IT Infrastructure
Human Genome SOAPdenovo EasyGenomicsTM (192 cores)
Genome Coverage 86% 86% Assembly Time 70h 55h
No. of Servers 1 15 Memory Size 500GB x 1 24 GB x 15
Mode Centralized Distributed
Hadoop-based Flexible Computing
![Page 50: Easygenomics ISCB Cloud section 2012](https://reader038.fdocuments.in/reader038/viewer/2022102917/58f1768a1a28ab77318b4593/html5/thumbnails/50.jpg)
Enabling Technology
SOAP Hadoop (Gaea)
GPU
50