Detection, Diagnosis and Correction of Batch Effects in ...Rehan Akbani Subject: TCGA 2nd Annual...

Post on 05-Jul-2020

2 views 0 download

Transcript of Detection, Diagnosis and Correction of Batch Effects in ...Rehan Akbani Subject: TCGA 2nd Annual...

Detection, Diagnosis and Correction of Batch Effects in TCGA Data

Rehan Akbani

Dept. of Bioinformatics and Computational Biology, UT MD Anderson Cancer Center

Simplified Flow Diagram for TCGA

Tissue Mayo Clinic MD Anderson MSKCC Source Sites (TSS)

Biospecimen Core Resources (BCR)

Cancer Genome Characterization Centers (CGCC) / Genome Sequencing Centers (GSC)

Samples

International Nationwide Genomics Children’s

Consortium Hospital

DNA/RNA

…Broad USC UNC

Data TCGA

Data Coordination Website Center (DCC)

2

   

      

Step 1: Batch effects diagnoses

• Objectives – Detect / quantify batch effects – Identify source(s) of batch effects

• Tools / algorithms for diagnoses – MBatch R package http://bioinformatics.mdanderson.org/tcgabatcheffects/ 1. PCA-Plus plots (novel) 2. BatchCorr algorithm (novel) 3. Hierarchical clustering 4. Clinical correlates 5. Box plots 6. ANOVA / MANOVA

• Disclaimer: No substitute for human input

3

1. PCA-Plus http://bioinformatics.mdanderson.org/tcgabatcheffects/

4

5 p-values

2. BatchCorr

Batch effects present:

BatchCorr < 0.7 AND p-value < 0.05

BatchCorr value

3. Hierarchical Clustering

6

4.

READ

ClinicalCorrelates COAD/

miRNA

7

5. Box plots

Batch medians Batch means

8

9

6. ANOVA / MANOVA

62.3%

32.5%

2.3%

1.6% 0.4%

0.9%

Batch ID TSS

BCR

59.3%

33.9%

1.2%

1.1%

2.9%

0.1%

Batch ID

subtype

1.2%

1.2% 0.1%

TSS

    

Step 2: Batch effects correction

• Correct the source of the problem whenever possible

• When not possible, or source unknown, algorithms can be used

• Some algorithms for correction: – ComBat (aka Empirical Bayes) – ANOVA – Median Polish

• Included in MBatch R package http://bioinformatics.mdanderson.org/tcgabatcheffects/

10

Male

Female

Kidney cancer (KIRC) DNA methyla;on data  (27k)

Dichotomyobserved  

11

New dichotomy basedon batch ID appears

12

A.er removing  sex chromosomes

A.er removing bad probes

Chromophobes

14

TCGA MBatch website http://bioinformatics.mdanderson.org/tcgabatcheffects/

15

TCGA MBatch website http://bioinformatics.mdanderson.org/tcgabatcheffects/

Zoom Pan Mouse-over

16

TCGA MBatch website http://bioinformatics.mdanderson.org/tcgabatcheffects/

17

TCGA MBatch website http://bioinformatics.mdanderson.org/tcgabatcheffects/

18

TCGA MBatch website http://bioinformatics.mdanderson.org/tcgabatcheffects/

           

   

Acknowledgments

UT MD Anderson • Nianxiang Zhang • Tod D. Casasent • Chris Wakefield • James M. Melott • Anna K. Unruh • Thomas C. Motter • Bradley M. Broom • John N. Weinstein

In-Silico • James Cleland • Andy Wong • Mike Ryan

Poster # 50

19