A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand...

14
A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes ENS Cachan 10 April 2012 Joint work with Alexandru Costan, Benoit Da Mota, Gabriel Antoniu, Bertrand Thirion

Transcript of A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand...

Page 1: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain

Radu Tudoran

KerData Team Inria Rennes

ENS Cachan 10 April 2012

Joint work with

Alexandru Costan, Benoit Da Mota, Gabriel Antoniu, Bertrand Thirion

Page 2: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

The A-Brain Project

Application - Large-scale joint genetic and

neuroimaging data analysis

Goal - Assess and understand the

variability between individuals

Approach - Optimized data processing on

Microsoft’s Azure clouds

Inria teams involved - KerData (Rennes)

- Parietal (Saclay)

Framework - Joint MSR-Inria Research Center

- MS involvement: Azure teams, EMIC

2

Page 3: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

Genetic information: SNPs

G G T G T T T G

G G

MRI brain images

Clinical / behaviour

The Imaging Genetics Challenge:

Comparing Heterogeneous Information

T Here we

focus on

this link

3

Page 4: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

Neuroimaging-genetics: The Problem

Several brain diseases have a genetic origin, or

their occurrence/severity related to genetic factors

Genetics is important to understand & predict

response to treatment

identify risk and protective factors for brain

diseases

Brain: Huntington's disease, autism…

Currently: large-scale studies to

assess the relationships between

diseases and genes: typically 104

patients per study + control groups

Genetic variability captured in DNA

microarray data

p( ) |

Gene→Image

genetic image

4

Page 5: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

A-Brain

5

p( ) ,

Genetic data Brain image

Y

q~105-6

N~2000

X

p~106

– Anatomical MRI

– Functional MRI

– Diffusion MRI

– DNA array (SNP/CNV)

– gene expression data

– others...

finding associations:

Page 6: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

Imaging Genetics Methodological Issues

Multivariate methods: predict

brain characteristic with many

genetic variables

Elastic net regularization:

combination of ℓ1 and ℓ2 penalties

→ sparse loadings

O(p3 complexity)

parameters setting: internal cross-

validation/bootstrap

Performance evaluated using

permutations

6

Page 7: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

A-Brain as MapReduce process

7

R1 R2 R3

Results

Intermediate

Data

Input Data

Final Data

R1=X1 op Y R2=X2 op Y R3=X3 op YX3=shuffle(X)X2=shuffle(X)X1=shuffle(X)

Reduce

Map

Result<= filter(R1,R2,R3)

X Y

Page 8: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

8

Challenges …

Data: 8 ∗ 104 ∗ 5 ∗ 104 ∗ 5 ∗ 105 ⇒ 1.77 𝑃𝐵

double

permutation

voxels

SNPs

5%-10%

useful

Computation: 104 ∗ 5 ∗ 104 ∗ 5 ∗ 105 ⇒ 2.5 ∗ 1014 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑖𝑜𝑛𝑠

Estimate timespan

on single machine

Initial Algorithm: 1.67 ∗ 104 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑖𝑜𝑛𝑠/𝑠𝑒𝑐𝑜𝑛𝑑𝑠

Current Algorithm: 1.5 ∗ 106 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑖𝑜𝑛𝑠/𝑠𝑒𝑐𝑜𝑛𝑑𝑠

1.67 ∗ 108 𝑠𝑒𝑐𝑜𝑛𝑑𝑠 ⇒ 5.3 𝑦𝑒𝑎𝑟𝑠

Page 9: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

Azure can help…

9

Evaluation of the algorithm on Azure : 1.47 ∗ 106 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑖𝑜𝑛𝑠/𝑠𝑒𝑐𝑜𝑛𝑑

Estimation for A-Brain on Azure (350 cores)

2.5 ∗ 1014

350 ∗ 1.47 ∗ 106 seconds ≈485 ∗ 103 𝑠𝑒𝑐𝑜𝑛𝑑𝑠

5.3 𝑦𝑒𝑎𝑟𝑠 ⇒ 5.6 𝑑𝑎𝑦𝑠

Storage capacity estimations (350 cores) 255𝐺𝐵 ∗ 350 ≈ 87𝑇𝐵

• Feats the 5% threshold of useful data

• We can always do several iterations

Page 10: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

TomusBlobs as a Storage Backend for

Sharing Application Data in MapReduce

10

App

API

App App App App

API API API API

TomusBlobs

Page 11: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

TomusBlobs: Application’s Throughput

read: 2.5x write: 3x

11

Application pattern read throughput Application pattern write throughput

Page 12: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

TomusBlobs: Cumulative Throughput

read: 4x write: 5x

12

Cumulative read throughput Cumulative write throughput

Page 13: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

A-Brain’s timespan

13

Increase precision Increase data size

Page 14: A-Brain: Using the Cloud to Understand the Impact of ... · A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain Radu Tudoran KerData Team Inria Rennes

Our experience on Azure in the A-Brain project

14

• Scale up to 350 cores

• Memory/CPUs tradeoff for the VM selection

• Planning soon to launch “the big experiments”

• Continuous running time so far 1-2 days

• ≈ 60K hours of computation used so far