Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013
-
Upload
amazon-web-services -
Category
Technology
-
view
477 -
download
2
description
Transcript of Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013
![Page 1: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/1.jpg)
Science as a Service
Ian Foster, The University of Chicago and Argonne National Laboratory
November 14, 2013
![Page 2: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/2.jpg)
A time of disruptive change
![Page 3: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/3.jpg)
A time of disruptive change
![Page 4: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/4.jpg)
Most labs have limited resources Heidorn: NSF grants in 2007
< $350,000 80% of awards 50% of grant $$
$1,000,000
$100,000
$10,000
$1,000
2000 4000 6000 8000
![Page 5: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/5.jpg)
Automation is required to apply more sophisticated methods to far more data
![Page 6: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/6.jpg)
Automation is required to apply more sophisticated methods to far more data
Outsourcing is needed to achieve economies of scale in the use of automated methods
![Page 7: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/7.jpg)
Building a discovery cloud • Identify time-consuming activities amenable to
automation and outsourcing • Implement as high-quality, low-touch SaaS • Leverage IaaS for reliability,
economies of scale • Extract common elements as
research automation platform Bonus question: Sustainability
Software as a service
Platform as a service
Infrastructure as a service
![Page 8: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/8.jpg)
We aspire (initially) to create a great user experience for
research data management
What would a “dropbox for science” look like?
![Page 9: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/9.jpg)
• Collect • Move • Sync • Share • Analyze
• Annotate • Publish • Search • Backup • Archive
BIG DATA
![Page 10: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/10.jpg)
Registry Staging Store
Ingest Store
Analysis Store
Community Store
Archive Mirror
Ingest Store
Analysis Store
Community Store
Archive Mirror
Registry
Quota exceeded
!
Expired credentials
!
Network failed. Retry.
!
Permission denied
!
It should be trivial to Collect, Move, Sync, Share, Analyze, Annotate, Publish, Search, Backup, & Archive BIG DATA … but in reality it’s often very challenging
![Page 11: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/11.jpg)
• Collect • Move • Sync • Share • Analyze
• Annotate • Publish • Search • Backup • Archive
BIG DATA
![Page 12: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/12.jpg)
• Collect • Move • Sync • Share • Analyze
• Annotate • Publish • Search • Backup • Archive
BIG DATA
• Move • Sync • Share Capabilities delivered using
Software-as-Service (SaaS) model
![Page 13: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/13.jpg)
Data Source
Data Destination
User initiates transfer request
1
Globus Online moves/syncs files
2
Globus Online notifies user
3
![Page 14: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/14.jpg)
Data Source
User A selects file(s) to share; selects user/group, sets share permissions
1
Globus Online tracks shared files; no need to move files to cloud storage!
2
User B logs in to Globus Online and accesses
shared file
3
![Page 15: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/15.jpg)
Extreme ease of use • InCommon, Oauth, OpenID, X.509, … • Credential management • Group definition and management • Transfer management and optimization • Reliability via transfer retries • Web interface, REST API, command line • One-click “Globus Connect” install • 5-minute Globus Connect Multi User install
![Page 16: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/16.jpg)
Early adoption is encouraging
![Page 17: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/17.jpg)
Early adoption is encouraging
>12,000 registered users; >150 daily >27 PB moved; >1B files
10x (or better) performance vs. scp 99.9% availability
Entirely hosted on Amazon
![Page 18: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/18.jpg)
Amazon web services used • Amazon EC2 for hosting Globus services • Elastic Load Balancing to use multiple
Availability Zones for reliability and uptime • Amazon S3 to store historical state • Amazon RDS PostgreSQL for active state
![Page 19: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/19.jpg)
K. Heitmann (Argonne) moves 22 TB of cosmology data LANL ANL at 5 Gb/s
![Page 20: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/20.jpg)
B. Winjum (UCLA) moves 900K-file plasma physics datasets UCLA NERSC
![Page 21: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/21.jpg)
Dan Kozak (Caltech) replicates 1 PB LIGO astronomy data for resilience
![Page 22: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/22.jpg)
2
Credit: Kerstin Kleese-van Dam
Erin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANL
![Page 23: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/23.jpg)
• Collect • Move • Sync • Share • Analyze
• Annotate • Publish • Search • Backup • Archive
BIG DATA
• Move • Sync • Share Capabilities delivered using
Software-as-Service (SaaS) model
![Page 24: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/24.jpg)
• Collect • Move • Sync • Share • Analyze
• Annotate • Publish • Search • Backup • Archive
BIG DATA
![Page 25: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/25.jpg)
Globus Online already does a lot
Globus Toolkit
Sharing Service Transfer Service
Globus Nexus (Identity, Group, Profile)
Glo
bus
Onl
ine
API
s
Glo
bus
Con
nect
![Page 26: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/26.jpg)
The identity challenge in science • Research communities often need to
– Assign identities to their users – Manage user profiles – Organize users into groups for authorization
• Obstacles to high-quality implementations – Complexity of associated security protocols – Creation of identity silos – Multiple credentials for users – Reliability, availability, scalability, security
![Page 27: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/27.jpg)
Nexus provides four key capabilities • Identity provisioning
– Create, manage Globus identities
• Identity hub – Link with other identities; use
to authenticate to services
• Group hub – User-managed groups; groups can
be used for authorization
• Profile management – User-managed attributes;
can use in group admission
I
I I I
I
I a b
I U
V G
Key points: 1) Outsource
identity, group, profile management
2) REST API for flexible integration
3) Intuitive, customizable Web interfaces
![Page 28: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/28.jpg)
Branded sites
Open Science Grid University of Chicago XSEDE
DOE kBase Indiana University University of Exeter
Globus Online NERSC NIH BIRN
![Page 29: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/29.jpg)
A platform for integration
![Page 30: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/30.jpg)
A platform for integration
![Page 31: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/31.jpg)
A platform for integration
![Page 32: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/32.jpg)
Data management SaaS (Globus) + Next-gen sequence analysis pipelines (Galaxy) +
Cloud IaaS (Amazon) = Flexible, scalable, easy-to-use genomics analysis for
all biologists
globus genomics
![Page 33: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/33.jpg)
Globus Toolkit
Sharing Service Transfer Service
Globus Nexus (Identity, Group, Profile)
Glo
bus
Onl
ine
API
s
Glo
bus
Con
nect
We are adding capabilities
![Page 34: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/34.jpg)
Globus Toolkit
Sharing Service Transfer Service
Dataset Services
Globus Nexus (Identity, Group, Profile)
Glo
bus
Onl
ine
API
s
Glo
bus
Con
nect
We are adding capabilities
![Page 35: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/35.jpg)
We are adding capabilities • Ingest and publication
– Imagine a DropBox that not only replicates, but also extracts metadata, catalogs, converts
• Cataloging – Virtual views of data based on user-defined and/or automatically
extracted metadata
• Computation – Associate computational procedures, orchestrate application,
catalog results, record provenance
![Page 36: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/36.jpg)
Next Gen Sequencing Analysis for Everyone – No IT Required
Ravi K Madduri, The University of Chicago and Argonne National Laboratory
November 14, 2013
![Page 37: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/37.jpg)
One slide to get your attention
![Page 38: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/38.jpg)
Outline • Globus Vision • Challenges in Sequencing Analysis
– Big Data Management – Analysis at Scale – Reproducibility
• Proposed Approach Using Globus Genomics • Example Collaborations • Q&A
![Page 39: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/39.jpg)
Globus Vision Goal: Accelerate discovery and innovation worldwide by providing research IT as a service Leverage software-as-a-service to:
– provide millions of researchers with unprecedented access to powerful tools for managing Big Data
– reduce research IT costs dramatically via economies of scale
“Civilization advances by extending the number of important operations which we can perform without thinking of them” —Alfred North Whitehead , 1911
![Page 40: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/40.jpg)
Challenges in Sequencing Analysis
Sequencing Centers
Sequencing Centers
Data Movement and Access Challenges
Manual Data Analysis
Public Data
Storage
Local Cluster/ Cloud Seq
Center
Research Lab
How do we analyze this Sequence Data
Picard
GATK
Fastq Ref Genome
Alignment
Variant Calling
• Manually move the data to the Compute node
(Re)Run Script
Install
Modify
• Install all the tools required for the Analysis • BWA, Picard, GATK, Filtering Scripts, etc. • Shell scripts to sequentially execute the tools
• Manually modify the scripts for any change • Error Prone, difficult to keep track, messy.. • Difficult to maintain and transfer the knowledge
• Data is distributed in different locations • Research labs need access to the data for analysis • Be able to share data with other researchers/collaborators
• Inefficient ways of data movement • Data needs to be available on the local and distributed compute
Resources • Local clusters, cloud, grid and transfer the knowledge
![Page 41: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/41.jpg)
Globus Genomics
Sequencing Centers Sequencing Centers
Public Data
Storage
Local Cluster/ Cloud Seq
Center
Research Lab
Globus Provides a • High-performance • Fault-tolerant • Secure file transfer Service between all data-endpoints
Data Management Data Analysis
Galaxy Data Libraries
• Globus Integrated within Galaxy
• Web-based UI • Drag-Drop workflow
creations • Easily modify
Workflows with new tools
Globus Genomics on Amazon EC2
• Analytical tools are automatically run on the scalable compute resources when possible
Galaxy Based Workflow Management System
Globus Genomics
![Page 42: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/42.jpg)
Globus Genomics Architecture
Figure 2: Globus Genomics Architecture
![Page 43: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/43.jpg)
Globus Genomics Usage
![Page 44: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/44.jpg)
![Page 45: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/45.jpg)
Globus Genomics • Computational profiles for
various analysis tools • Resources can be provisioned
on-demand with Amazon Web Services cloud based infrastructure
• Glusterfs as a shared file system between head nodes and compute nodes
• Provisioned I/O on Amazon EBS
![Page 46: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/46.jpg)
Coming soon! • Integration with Globus Catalog
– Better data discovery and metadata management
• Integration with Globus Sharing – Easy and secure method to share large datasets with collaborators
• Integration with Amazon Glacier for data archiving • Support for high throughput computational
modalities through Apache Mesos – MapReduce and MPI clusters
• Dynamic Storage Strategies using Amazon S3 or LVM-based shared file system
![Page 47: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/47.jpg)
![Page 48: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/48.jpg)
Provide more capability for more people at lower cost by building a “Discovery Cloud”
Delivering “Science as a service”
Our vision for a 21st century discovery infrastructure
![Page 49: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/49.jpg)
Thank you to our sponsors
![Page 50: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/50.jpg)
For more information • More information on Globus Genomics and to
sign up: www.globus.org/genomics • More information on Globus:
www.globusonline.org • Follow us on Twitter: @ianfoster, @madduri,
@globusgenomics, @globusonline
![Page 51: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/51.jpg)
Thank you!
![Page 52: Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) | AWS re:Invent 2013](https://reader034.fdocuments.in/reader034/viewer/2022052522/554e9fe2b4c905fb7c8b45d6/html5/thumbnails/52.jpg)
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
BDT 310