Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

36
Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre

Transcript of Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

Page 1: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

Cloud infrastructure for training in Life Sciences

Manuel Corpas

The Genome Analysis Centre

Page 2: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

[egi.edu]The Genome Analysis CentreThe Genome Analysis Centre

@manuelcorpas

Page 3: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 4: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.
Page 5: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.
Page 6: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

Bottleneck is NOT

• Production of data• Technology• Budget

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 7: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

Bottleneck IS

•TRAINING!

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 8: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

Bottleneck IS

•TRAINING!–Bioinformatics

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 9: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

Bioinformatics Training

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 10: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

The Genome Analysis Centre

Mick Watson

Roslin Institute

The Genome Analysis Centre@manuelcorpas

Page 11: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Most bioinformaticians are bad scientists

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 12: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Most bioinformaticians are bad scientists

2. Most biologists are bad bioinformaticians: poor computer skills, bad at maths/statistics

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 13: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Most bioinformaticians are bad scientists

2. Most biologists are bad bioinformaticians: poor computer skills, bad at maths/statistics

3. Short courses benefit no-one

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 14: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

The Genome Analysis Centre

Carole Goble

University of Manchester

The Genome Analysis Centre@manuelcorpas

Page 15: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

• Students and trainers don’t like learning how to use new things

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 16: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

• Students and trainers don’t like learning how to use new things

• Trainees need to be eased in by using familiar stuff

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 17: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

How can we bridge the gap?

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 18: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

The Genome Analysis Centre

Titus BrownMichigan State University

The Genome Analysis Centre@manuelcorpas

Page 19: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Participants bring their laptops

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 20: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Participants bring their laptops2. Pre installed machines

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 21: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Participants bring their laptops2. Pre installed machines3. Cloud computing

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 22: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

Cloud + Bioinformatics + Training

=

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 23: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

Why Bioinformatics Training in the Cloud?

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 24: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

3 Advantages

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

[Adapted from Titus Brown]

Page 25: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Participants can use own – Computers–Web browser

2. Graphical interaction via– X Windowes– IPython– Knitr

3. Compute can be scaled up/down depending on what it’s being taught

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 26: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Participants can use own – Computers–Web browser

2. Graphical interaction via– X Windows– IPython– Knitr

3. Compute can be scaled up/down depending on what it’s being taught

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 27: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Participants can use own – Computers–Web browser

2. Graphical interaction via– X Windowes– IPython– Knitr

3. Compute can be scaled up/down depending on what it’s being taught

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 28: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

3 Challenges

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

[Adapted from Titus Brown]

Page 29: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Institutional resistance– Privacy of clinically sensitive data

2. Reliable network access and servers needed –> 30 people clicking at the same time!

3. Cost

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 30: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Institutional resistance– Privacy of clinically sensitive data

2. Reliable network access and servers needed –> 30 people clicking at the same time!

3. Cost

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 31: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

1. Institutional resistance– Privacy of clinically sensitive data

2. Reliable network access and servers needed –> 30 people clicking at the same time!

3. Cost

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 32: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

Materials

Data

NM

Trainee Trainer

Registry

Genomics

VMs+tools

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

National eResearch Collaboration Tools and Resources (NeCTAR)

Watson-Haigh et al. 2013

Page 33: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

MRC UK Microbial Genomicshttp://climb.ac.uk

• Open Stack• Each VM 32Gb RAM, 8 cores, 1Tb• Biolinux

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Nick Loman, University of Birmingham

Page 34: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

Why Cloud?

• Very little technical knowledge required

• Snapshot ready for replication• User can take instance home

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 35: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

Cloud + Bioinformatics + Training

=

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Page 36: Cloud infrastructure for training in Life Sciences Manuel Corpas The Genome Analysis Centre.

The Genome Analysis CentreThe Genome Analysis Centre@manuelcorpas

Rafael Jiménez

[email protected]

• Titus Brown

• Mick Watson

• Carole Goble

• Nick Loman

• Vicky

Schneider