Afgan bosc2010 galaxy_cloud

Post on 10-May-2015

719 views 1 download

Tags:

Transcript of Afgan bosc2010 galaxy_cloud

Deploying Galaxy on the Cloud

Enis Afgan, Dannon Baker, Nate Coraor, Anton Nekrutenko, James Taylor

Discovery of human heteroplasmic sitesenabled by an accessible interface to

cloud!computing infrastructure

Enis Afgan, Hiroki Goto, Ian Paul, Francesca ChiaromonteKateryna Makova, Anton Nekrutenko, James Taylor

15

CAMPAIGN EMORY BRAND GUIDELINESwww.campaign.emory.edu

Each school and unit will have its own campaign stationery, which includes the logo and marble and is immediately recognizable as part of Campaign Emory.

Stationery

N E L L H O D G S O N W O O D R U F F S C H O O L O F N U R S I N G

Emory University . 1520 Clifton Road, NE . Atlanta, Georgia 30322-4207 . 000.000.0000

C A M P A I G N

Emory UniversityNell Hodgson Woodruff School of Nursing1520 Clifton Road, NEAtlanta, Georgia 30322-4207

C A M P A I G N

AMY DORRILLChief Development Officer

Emory UniversitySchool of Nursing Development1520 Clifton Road, NEAtlanta, Georgia 30322-4207P 404.727.1234E adorrill@emory.edu

C A M P A I G N

C A M P A I G N

notecard: 7in. x 5in.

letterhead: 8.5in. x 11in.

envelope: No. 10

business card: 3.5in. x 2in. (standard size)

Permissions: you are free to blog or live-blog about this presentation as long as you attribute the work to its authors

Discovery of human heteroplasmic sitesenabled by an accessible interface to

cloud!computing infrastructure

Enis Afgan, Hiroki Goto, Ian Paul, Francesca ChiaromonteKateryna Makova, Anton Nekrutenko, James Taylor

15

CAMPAIGN EMORY BRAND GUIDELINESwww.campaign.emory.edu

Each school and unit will have its own campaign stationery, which includes the logo and marble and is immediately recognizable as part of Campaign Emory.

Stationery

N E L L H O D G S O N W O O D R U F F S C H O O L O F N U R S I N G

Emory University . 1520 Clifton Road, NE . Atlanta, Georgia 30322-4207 . 000.000.0000

C A M P A I G N

Emory UniversityNell Hodgson Woodruff School of Nursing1520 Clifton Road, NEAtlanta, Georgia 30322-4207

C A M P A I G N

AMY DORRILLChief Development Officer

Emory UniversitySchool of Nursing Development1520 Clifton Road, NEAtlanta, Georgia 30322-4207P 404.727.1234E adorrill@emory.edu

C A M P A I G N

C A M P A I G N

notecard: 7in. x 5in.

letterhead: 8.5in. x 11in.

envelope: No. 10

business card: 3.5in. x 2in. (standard size)

Permissions: you are free to blog or live-blog about this presentation as long as you attribute the work to its authors

Bioinformatics Open Source Conference, July 9, 2010, Boston, MA

Galaxy: accessible analysis system

• Easily integrate new tools

• Consistent tool user interfaces automatically generated

• History system facilitates and tracks multistep analyses

• Exact parameters of a step can always be inspected, and easily rerun

• Work!ow system

Enable accessible, transparent, and reproducible researchhttp://usegalaxy.org/

Cluster

Galaxy Jobs+ Galaxy Jobs+

Job

JobJob

Job

Workstation

Galaxy

Galaxy

Galaxy

Galaxy

Galaxy on the Cloud• Ideal for small labs and individual researchers

• Labs do not have to house compute resources

• Support variable volume of analysis data and computation requirements

• Ready deployment with pre-con"gured reference genomes and tools

• Goal is to keep Galaxy use unchanged but deliver !exibility and job performance improvement

Current Status• Deployment of Galaxy on Amazon Web Services Cloud

• Requires no computational expertise, no infrastructure, no software

• Support for dynamic resource scaling

• Support for dynamic storage

• Automated con"guration of the Galaxy Cloud machine image

• Deploy a Galaxy cluster in minutes!

Deploying Galaxy on the AWS Cloud

1. Create an AWS account and sign up for EC2 and S3 services

2. Use the AWS Management Console to start a master EC2 instance

3. Use the Galaxy Cloud web interface on the master instance to manage the cluster size

2. Start an EC2 Instance

3. Con"gure Your Cluster

(Starting Workers)

4. Grow and Shrink

Grow Storage

1. Stop services

2. Detach volume

3. Snapshot

4. New volume

5. Grow !le system

6. Resume services

Clean Up• Once the need for a given cluster subsides,

- you can always start it back up

• Data is preserved while a cluster is down

• Complete the shut down process by terminating the master instance from the AWS console

What is Coming

• Automatic cluster scaling

- Based on workload customization

• Automatic job splitting/parallelization

Questions&

CommentsTry your own cluster; it takes only 5 minutes and less than $1.

Complete instructions available at http://usegalaxy.org/cloud

A Little More GC Details

Management

Console

Galaxy

Application

1°2°

6°, 8°

Persistent

data

repository

Galaxy Controller

(GC)

Setup services

10°GC-w

GC-w

GC-w

GC-w

GC-w

Persistent storage

Galaxy Image

Galaxy Image

Galaxy Image

Galaxy Image

Galaxy Image

Galaxy Image

Master instance

11°

Cloud or No Cloud?

• Consumption based cost - cost reduction?

• Better utilization of resource

• Management done by cloud provider

• Faster deployment time

• Dynamic scalability

• Not a silver bullet

• Expensive for 24/7 use

• Offers scalability in terms of infrastructure, applications are still sequential

• The data transfer problem?

• Security?

Pros Cons

Enabling Persistence

Galaxy

Tools

Galaxy

Indices

Public EBS snapshots

Galaxy

Tools

Galaxy

Indices

User ACluster 1

User

Data

Galaxy

Tools

Galaxy

Indices

User ACluster 2

User

Data

User ACluster 1

User

DataOn terminateGalaxy

Tools

Galaxy

Indices

User BCluster 1

User

Data

Private EBS volume

Enabling VersioningGC-User A,

Cluster1

GC-User A,

Cluster2

- latest GC used

- snaps IDs

GC-User A,

Cluster1

GC-User A,

Cluster2

GC-default

GC-snaps

GC-default

GC source

- latest

- prev. versions

GC-snaps

Public snap IDs

- latest

- prev. versions

PublicS3 buckets

PrivateS3 bucket

- latest GC used

- snaps IDs