Afgan bosc2010 galaxy_cloud

19
Deploying Galaxy on the Cloud Enis Afgan , Dannon Baker, Nate Coraor, Anton Nekrutenko, James Taylor Bioinformatics Open Source Conference, July 9, 2010, Boston, MA

Transcript of Afgan bosc2010 galaxy_cloud

Page 1: Afgan bosc2010 galaxy_cloud

Deploying Galaxy on the Cloud

Enis Afgan, Dannon Baker, Nate Coraor, Anton Nekrutenko, James Taylor

Discovery of human heteroplasmic sitesenabled by an accessible interface to

cloud!computing infrastructure

Enis Afgan, Hiroki Goto, Ian Paul, Francesca ChiaromonteKateryna Makova, Anton Nekrutenko, James Taylor

15

CAMPAIGN EMORY BRAND GUIDELINESwww.campaign.emory.edu

Each school and unit will have its own campaign stationery, which includes the logo and marble and is immediately recognizable as part of Campaign Emory.

Stationery

N E L L H O D G S O N W O O D R U F F S C H O O L O F N U R S I N G

Emory University . 1520 Clifton Road, NE . Atlanta, Georgia 30322-4207 . 000.000.0000

C A M P A I G N

Emory UniversityNell Hodgson Woodruff School of Nursing1520 Clifton Road, NEAtlanta, Georgia 30322-4207

C A M P A I G N

AMY DORRILLChief Development Officer

Emory UniversitySchool of Nursing Development1520 Clifton Road, NEAtlanta, Georgia 30322-4207P 404.727.1234E [email protected]

C A M P A I G N

C A M P A I G N

notecard: 7in. x 5in.

letterhead: 8.5in. x 11in.

envelope: No. 10

business card: 3.5in. x 2in. (standard size)

Permissions: you are free to blog or live-blog about this presentation as long as you attribute the work to its authors

Discovery of human heteroplasmic sitesenabled by an accessible interface to

cloud!computing infrastructure

Enis Afgan, Hiroki Goto, Ian Paul, Francesca ChiaromonteKateryna Makova, Anton Nekrutenko, James Taylor

15

CAMPAIGN EMORY BRAND GUIDELINESwww.campaign.emory.edu

Each school and unit will have its own campaign stationery, which includes the logo and marble and is immediately recognizable as part of Campaign Emory.

Stationery

N E L L H O D G S O N W O O D R U F F S C H O O L O F N U R S I N G

Emory University . 1520 Clifton Road, NE . Atlanta, Georgia 30322-4207 . 000.000.0000

C A M P A I G N

Emory UniversityNell Hodgson Woodruff School of Nursing1520 Clifton Road, NEAtlanta, Georgia 30322-4207

C A M P A I G N

AMY DORRILLChief Development Officer

Emory UniversitySchool of Nursing Development1520 Clifton Road, NEAtlanta, Georgia 30322-4207P 404.727.1234E [email protected]

C A M P A I G N

C A M P A I G N

notecard: 7in. x 5in.

letterhead: 8.5in. x 11in.

envelope: No. 10

business card: 3.5in. x 2in. (standard size)

Permissions: you are free to blog or live-blog about this presentation as long as you attribute the work to its authors

Bioinformatics Open Source Conference, July 9, 2010, Boston, MA

Page 2: Afgan bosc2010 galaxy_cloud

Galaxy: accessible analysis system

• Easily integrate new tools

• Consistent tool user interfaces automatically generated

• History system facilitates and tracks multistep analyses

• Exact parameters of a step can always be inspected, and easily rerun

• Work!ow system

Enable accessible, transparent, and reproducible researchhttp://usegalaxy.org/

Page 3: Afgan bosc2010 galaxy_cloud

Cluster

Galaxy Jobs+ Galaxy Jobs+

Job

JobJob

Job

Workstation

Galaxy

Galaxy

Galaxy

Galaxy

Page 4: Afgan bosc2010 galaxy_cloud

Galaxy on the Cloud• Ideal for small labs and individual researchers

• Labs do not have to house compute resources

• Support variable volume of analysis data and computation requirements

• Ready deployment with pre-con"gured reference genomes and tools

• Goal is to keep Galaxy use unchanged but deliver !exibility and job performance improvement

Page 5: Afgan bosc2010 galaxy_cloud

Current Status• Deployment of Galaxy on Amazon Web Services Cloud

• Requires no computational expertise, no infrastructure, no software

• Support for dynamic resource scaling

• Support for dynamic storage

• Automated con"guration of the Galaxy Cloud machine image

• Deploy a Galaxy cluster in minutes!

Page 6: Afgan bosc2010 galaxy_cloud

Deploying Galaxy on the AWS Cloud

1. Create an AWS account and sign up for EC2 and S3 services

2. Use the AWS Management Console to start a master EC2 instance

3. Use the Galaxy Cloud web interface on the master instance to manage the cluster size

Page 7: Afgan bosc2010 galaxy_cloud

2. Start an EC2 Instance

Page 8: Afgan bosc2010 galaxy_cloud

3. Con"gure Your Cluster

Page 9: Afgan bosc2010 galaxy_cloud

(Starting Workers)

Page 10: Afgan bosc2010 galaxy_cloud
Page 11: Afgan bosc2010 galaxy_cloud

4. Grow and Shrink

Page 12: Afgan bosc2010 galaxy_cloud

Grow Storage

1. Stop services

2. Detach volume

3. Snapshot

4. New volume

5. Grow !le system

6. Resume services

Page 13: Afgan bosc2010 galaxy_cloud

Clean Up• Once the need for a given cluster subsides,

- you can always start it back up

• Data is preserved while a cluster is down

• Complete the shut down process by terminating the master instance from the AWS console

Page 14: Afgan bosc2010 galaxy_cloud

What is Coming

• Automatic cluster scaling

- Based on workload customization

• Automatic job splitting/parallelization

Page 15: Afgan bosc2010 galaxy_cloud

Questions&

CommentsTry your own cluster; it takes only 5 minutes and less than $1.

Complete instructions available at http://usegalaxy.org/cloud

Page 16: Afgan bosc2010 galaxy_cloud

A Little More GC Details

Management

Console

Galaxy

Application

1°2°

6°, 8°

Persistent

data

repository

Galaxy Controller

(GC)

Setup services

10°GC-w

GC-w

GC-w

GC-w

GC-w

Persistent storage

Galaxy Image

Galaxy Image

Galaxy Image

Galaxy Image

Galaxy Image

Galaxy Image

Master instance

11°

Page 17: Afgan bosc2010 galaxy_cloud

Cloud or No Cloud?

• Consumption based cost - cost reduction?

• Better utilization of resource

• Management done by cloud provider

• Faster deployment time

• Dynamic scalability

• Not a silver bullet

• Expensive for 24/7 use

• Offers scalability in terms of infrastructure, applications are still sequential

• The data transfer problem?

• Security?

Pros Cons

Page 18: Afgan bosc2010 galaxy_cloud

Enabling Persistence

Galaxy

Tools

Galaxy

Indices

Public EBS snapshots

Galaxy

Tools

Galaxy

Indices

User ACluster 1

User

Data

Galaxy

Tools

Galaxy

Indices

User ACluster 2

User

Data

User ACluster 1

User

DataOn terminateGalaxy

Tools

Galaxy

Indices

User BCluster 1

User

Data

Private EBS volume

Page 19: Afgan bosc2010 galaxy_cloud

Enabling VersioningGC-User A,

Cluster1

GC-User A,

Cluster2

- latest GC used

- snaps IDs

GC-User A,

Cluster1

GC-User A,

Cluster2

GC-default

GC-snaps

GC-default

GC source

- latest

- prev. versions

GC-snaps

Public snap IDs

- latest

- prev. versions

PublicS3 buckets

PrivateS3 bucket

- latest GC used

- snaps IDs