Aug2015 analysis team 10 mason epigentics

29
Epigenetics QC (EpiQC) and single-cell RNA- seq variant calling from NIST GIAB samples Christopher E. Mason Associate Professor Department of Physiology and Biophysics & The Institute for Computational Biomedicine at the Weill Cornell Medical College and the Tri-Institutional Program on Computational Biology and Medicine August 27 th , 2015 _ @mason_lab

Transcript of Aug2015 analysis team 10 mason epigentics

Page 1: Aug2015 analysis team 10 mason epigentics

Epigenetics QC (EpiQC) and single-cell RNA-seq variant calling from NIST GIAB samples

Christopher E. MasonAssociate Professor

Department of Physiology and Biophysics &The Institute for Computational Biomedicine at the

Weill Cornell Medical College and theTri-Institutional Program on Computational Biology and Medicine

August 27th, 2015

_

@mason_lab

Page 2: Aug2015 analysis team 10 mason epigentics

FDA’s PMET-QC(Personalized Medicine Enabling Technologies Quality Control; the

forth phase of MAQC project (MAQC-IV)

Objectives: • QC PMET for an enhanced

reproducibility and reliability• Benchmarking bioinformatics

approaches for PMET data analysis to achieve best practice and standard data analysis protocols

Overview: profiling drugs response to a common panel of cancer cell lines for predicting drug sensitivity based on patient-specific genomic profiles. Specifically,

• Assess reproducibility of HTS assays for drug efficacy and safety (inter- and intra-lab reproducibility and cross-platform consistency)

• Benchmark bioinformatics approaches for HTS data analysis

WG1 - HTSQC

WG3 - EpiQC

Systems Biology

Overview: Generating reference gene expression and epigenetic datasets from (1) individual cell fractions and (2) whole tissue samples to develop in silico dormular based on mixed cell samples for their cellular composition to identify cell type specific signatures (e.g. in relevant diseases) using whole tissue samples..

WG2 - SeqQC

Code Release

Apps

Overview: systematically evaluate targeted sequencing approaches for identification of somatic mutations in cancer in a set of well-defined primary tumors. • Comparative analysis of WGS, WES and target-seq• Bioinformatics effects• Integrated analysis of DNA-seq and RNA-seq for an

improved variant call

Overview: • Comparative analysis of sequencing based

methods with microarrays for study of DNA methylation

• Benchmark computation tools for sequencing based DNA methylation data

Overview: emphasis on the integration of different molecular data (DNA methylation, DNA, RNA) for an enhanced personalized medicineVariant calling, RNA editing, and allele-specific expression from matched samples

Overview: Full Reproducibility of analysis and methods for samples and instances of code and runtime parameters• Virtual machines or docker instances• Zenodo code base

Page 3: Aug2015 analysis team 10 mason epigentics

Planned Epigenetics data sets for GIAB• XTen WBGS; not yet formally supported but proof of principle has been

demonstrated-Currently being processed at New York Genome Center, with

members from Mason Lab, John Greally, Soren Germer, and Frank Wos.-Generating 30X WGBS data for all the GIAB samples

• Illumina 450K methylation array data (Youping Deng)• CpGiant capture data from Roche (Mason)• Pending TAB-seq for hydroxy-methylation profiling (Mason)• Plan to share with the GIAB community• Also planning for single-cell RRBS on the Fluidigm C1 (in development)

Page 4: Aug2015 analysis team 10 mason epigentics

Why?

Page 5: Aug2015 analysis team 10 mason epigentics

DNA methylation defines cellular phenotypes and lineage specification, and much of the action is beyond CpG islands

Fernandez et al, Genome Research, 2012

Page 6: Aug2015 analysis team 10 mason epigentics

Weidner CI et al., Aging of blood can be tracked by DNA methylation changes at just three CpG sites. Genome Biol. 2014 Feb 3;15(2):R24.

DNA Marks can predict your age!

Page 7: Aug2015 analysis team 10 mason epigentics

Bisulfite conversion sequencing for detection of DNA methylation

Base resolution methylation level: C/(C+T)

Page 8: Aug2015 analysis team 10 mason epigentics

MethSuite:DNA methylome sequencing and analysis suite

Akalin A, Garrett-Bakelman F, et al., 2012. PloS GeneticsAkalin A, Kormaksson M, Li S, et al., 2012. Genome BiologyLi S, Garrett-Bakelman F, et al., 2013. BMC Bioinformatics.Li S, Garrett-Bakelman F, et al., 2014. Genome Biology.Garrett-Bakelman F, Sheridan C, et al., 2014, JOVE.Rampal R, Akalin A, et al., 2015, Cell Reports.

Enhanced reduced representation bisulfite conversion sequencing (ERRBS) or Whole Genome Bisulfite Sequencing (WBGS)

R package for the analysis of genome-wide DNA methylation profiles

R package for the differential methylation regional analysis

C++ program for detecting epialelle shift in tumor and other diseases

ERRBS & WBGS

Page 9: Aug2015 analysis team 10 mason epigentics

Differentially methylated Cytosines (DMCs) detection

Akalin A et al., 2012

Page 10: Aug2015 analysis team 10 mason epigentics

Hyper-methylation

Hypo-methylation

DNA methylation reveals dramatically different tumor types

Akalin et al. ,PLOS Genetics, 2012

Page 11: Aug2015 analysis team 10 mason epigentics

DNA methylation also measures epiallelesAn epiallele is one of a number of alternative, phased DNA methylation patterns of the same genetic locus

Epiallele Frequency/Read count

60

100

20

20

Genomic locus with four adjacent CpGs

Li et al., Dynamic Evolution of Clonal Epialleles Revealed by Methclone. Genome Biology, 2014.

Page 12: Aug2015 analysis team 10 mason epigentics

Open source and free epiclonality software

https://code.google.com/p/methclone/

Page 13: Aug2015 analysis team 10 mason epigentics

Li et al., Dynamic Evolution of Clonal Epialleles Revealed by Methclone. Genome Biology, 2014.

Epialleles reveal the clonality of cells in leukemia

D = DiagnosisR= Relapse

Page 14: Aug2015 analysis team 10 mason epigentics

Hydroxy-methylation (hmC) changes can also drive AML phenotypes

Rampal, Akalin, et al., Cell Reports, 2015

Page 15: Aug2015 analysis team 10 mason epigentics

hmC is a better predictor of gene expression change

Rampal, Akalin, et al., Cell Reports, 2015

mC hmC

Page 16: Aug2015 analysis team 10 mason epigentics

Why else?

Page 17: Aug2015 analysis team 10 mason epigentics

Single Cell RNA-seq Variant Calling with

GATK Haplotype Caller

Page 18: Aug2015 analysis team 10 mason epigentics

Exciting to see single cell expression of significantly differentially expressed genes between R and NR in oncology patients – but can we call variants?

method: monocle

significance cutoff:FDR <0.01

expression value:log2 (FPKM)

heatmap shows DEGs with mean FPKM > 1

Non-Responders Responders

Page 19: Aug2015 analysis team 10 mason epigentics

Tested Parameters• Min Pruning: Paths with fewer supporting kmers than the specified

threshold will be pruned from the graph (default 2) • Min Base Quality score-minimum base score to be considered for

calling. (Default 10)• Min Reads Per Alignment Start- Minimum number of reads sharing

the same alignment start for each genomic location in an active region(default 10).

• Heterozygosity: The probability the sample will differ from the reference (default .001).

0/0 0/0 0/0 0/1 0/0 1/1 0/1 0/0 0/1 0/1 0/1 1/1 1/1 0/0 1/1 0/1 1/1 1/1Half het false tn fp fp fn tp fp fn fn tpHalf het tru tn fp fp fn tp tp fn tp tp

Page 20: Aug2015 analysis team 10 mason epigentics

HZ calls very easy; het-calls often “not expressed”(ASM) & some bad cells

Page 21: Aug2015 analysis team 10 mason epigentics

Sensitivity-

Half Correct Calls Considered False Half Correct Calls Considered True

Page 22: Aug2015 analysis team 10 mason epigentics

Specificity

Half Correct Calls Considered False Half Correct Calls Considered True

Page 23: Aug2015 analysis team 10 mason epigentics

Precision

Half Correct Calls Considered False Half Correct Calls Considered True

Page 24: Aug2015 analysis team 10 mason epigentics

So far best at 97% sensitivity and 80% specificity

Page 25: Aug2015 analysis team 10 mason epigentics

Thanks!Thanks especially to:• Mason Lab: Priyanka Vijay, Deirdre O'Sullivan, Noah Alexander, Jorge

Gandara, Dhruva Chandramohan, Christian Kendall, Sheng Li, Elizabeth Hénaff

• NYGC: Frank Wos, Soren Gomer, John Greally• FDA: Leming Shi, Youping Deng, and Weida Tong

Page 26: Aug2015 analysis team 10 mason epigentics

By genotype stats 0/0 0/0 0/0 0/1 0/0 1/1 0/1 0/0 0/1 0/1 0/1 1/1 1/1 0/0 1/1 0/1 1/1 1/1

Checking for alt alleles tn fp fp fn tp fp fn fn tpmin prun3 80585 235875 411739 1102 30728 26651 0 231 45026minreads 30 295644 213073 360109 1324 27183 23485 5 231 39899min prun1 2121060 203237 333978 2001 24903 21873 37 288 37442minprun0 3708187 352729 582892 3424 43006 39086 64 490 65526mbq30 401 972 3651 1 125 215 0 1 382het.0005 392167 302928 526124 1864 39954 34901 6 328 59052het.005 392214 311023 496889 1664 37964 32697 4 303 55522default 448316 342200 581646 2063 43683 38861 5 362 65155minreads20minprun1 3763754 352793 582913 3476 43064 39065 66 491 65524conf 10 493650 383769 587232 2266 44603 39225 7 372 65744conf 30 421210 315187 587486 1937 43726 39226 5 366 65739minprun 1 conf10 3766376 391851 582835 3656 43418 39083 81 505 65524minreads30minprun1 3066457 288194 473926 2812 35390 31500 59 383 53483min reads 40 minprun1 3778654 352795 582822 3466 43053 39062 69 491 65524Mbq20MinPrun1 2274548 339177 577934 3137 43074 39149 43 425 65577UG conf 10 - vars only

66281.60204

6039.285714

2184.846939

468.0510204

550.8265306

401.0816327

3.724489796 6.387755102

75.29591837

Page 27: Aug2015 analysis team 10 mason epigentics

Summary stats Sensitivity Specificity Precision FPR Accuracy

min prun3 98.27078496 10.67563092 10.10027746 89.32436908 18.79216816

minreads 30 97.72733895 33.13239442 10.10653123 66.86760558 37.74648708

min prun1 96.4033338 79.13965945 10.03245724 20.86034055 79.54641089

minprun0 96.46431428 79.18579835 10.0192109 20.81420165 79.59118773

mbq30 99.60707269 7.654132468 9.485500468 92.34586753 15.79679889

het.0005 97.82814909 31.22050441 10.28143462 68.77949559 36.18686474

het.005 97.93519595 31.81429938 10.00818975 68.18570062 36.56608546

default 97.81608369 31.7724091 10.15710959 68.2275909 36.599704minreads20minprun1 96.41896272 79.4288096 10.02327022 20.5711904 79.82324177

conf 10 97.65912631 32.82517974 9.847372728 67.17482026 37.35598701

conf 30 97.93510061 30.90068366 10.41171278 69.09931634 35.98084457

minprun 1 conf10 96.25212044 78.79208685 9.703476674 21.20791315 79.19594207minreads30minprun1 96.46791929 79.4403065 10.07067478 20.5596935 79.837225minreads40minprun1 96.4246068 79.49483026 10.02320781 20.50516974 79.88660352

Mbq20MinPrun1 96.78859036 70.40183137 10.20282446 29.59816863 71.28786646

UG conf10 56.69931621 88.48540799 6.767913297 11.51459201 88.02362115

Page 28: Aug2015 analysis team 10 mason epigentics

False Positive Rate

Half Correct Calls Considered False Half Correct Calls Considered True

Page 29: Aug2015 analysis team 10 mason epigentics

Accuracy

Half Correct Calls Considered False Half Correct Calls Considered True