AffyDEComp: towards a benchmark for differential expression methods
description
Transcript of AffyDEComp: towards a benchmark for differential expression methods
![Page 1: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/1.jpg)
AffyDEComp: towards a benchmark for differential
expression methods
Richard Pearson
School of Computer Science
University of Manchester
![Page 2: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/2.jpg)
Overview
Why benchmark DE methods?
The Golden Spike data set
AffyDEComp
Conclusions
Recommendations
![Page 3: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/3.jpg)
The need for benchmarks
Microarray analysis has many stages
Competing methods at each stage
Methodologists good at showing superiority
Results can appear contradictory
Confused end users choice driven by…What they are familiar with
What colleagues use
What was used in their favourite paper
…and not by a scientific comparison
![Page 4: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/4.jpg)
Benchmarking requirements
Methods: a set we wish to compareBenchmark data: where truth is knownMetrics: by which to compare methodsAffycomp
Methods: Summarisation methodsBenchmark data: various spike-in studiesMetrics: various, including, e.g. area under ROC curve for a fold change classifier
Affycomp doesn’t compare DE methods
![Page 5: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/5.jpg)
A benchmark for DE methods
Methods:DE methods depend on summarisation
Compare summarisation/DE combinations
Benchmark data:Affycomp spike-ins have few DE genes
Golden spike data has many DE genes, but also a few “issues”!
Metrics:Based around areas under ROC curves
![Page 6: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/6.jpg)
The Golden Spike data
3 “sample”, 3 “control” arrays
Many RNAs “spiked-in” at known levels
“DE”, “Equal” and “Empty” probesets.
Controversial data setNon-uniform null p-value distributions - use ROC
Spike-in concentrations high - unrepresentative
“DE” spike-ins all up-regulated - unrepresentative
Concentrations and FC confounded - loess
Different FC between “Equal” and “Empty”
![Page 7: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/7.jpg)
“Empty” > FC than “Equal”
Most analyses have treated both Empty and Equal as True Negatives - to what effect?
![Page 8: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/8.jpg)
“Empty” > FC than “Equal”
To illustrate how analysis choices effect results I’ll treat Empty and Equal as true negative (TN) and DE<=1.2 as true positive (TP)
![Page 9: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/9.jpg)
2-sided test
Large apparent difference between methodsCan you guess which paper used this chart?
![Page 10: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/10.jpg)
2-sided test
Large apparent difference between methodsAre TP correctly identified as up-regulated?
![Page 11: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/11.jpg)
1-sided test of up-regulation
Probesets identified as up-regulated not TP
![Page 12: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/12.jpg)
1-sided test of down-regulation
DE probesets are mostly being identified as down-regulated, despite the fact that they are in truth up-regulated
We appear to be identifying TP as down-regulated
![Page 13: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/13.jpg)
DE <=1.2 lower than Empty
TP are identified as down-regulated because most TN are “Empty” which have higher FC than DE <= 1.2
![Page 14: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/14.jpg)
Remove “empty” probesets
We can remedy this by using just Equal probesets as our TN…
…bearing in mind that this makes the data somewhat atypical
![Page 15: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/15.jpg)
Up-regulation - Empty in TN
Probesets identified as up-regulated generally not TP when using Empty in TN
![Page 16: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/16.jpg)
Up-regulation - TN Equal
Probesets identified as up-regulated more likely to be TP when using only Equal as TN
![Page 17: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/17.jpg)
Down-regulation - Empty in TN
DE probesets are mostly being identified as down-regulated, despite the fact that they are in truth up-regulated
We appear to be identifying TP as down-regulated when including Empty in TN
![Page 18: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/18.jpg)
Down-regulation - TN Equal
We generally don’t identify TP as down-regulated when excluding Empty in TN
![Page 19: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/19.jpg)
“Recommended” test
We recommend using just Equal as TN, and all DE as TP
![Page 20: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/20.jpg)
Recommended Up-reg
Using our recommendations, tests of up-regulation generally find TP, as expected
![Page 21: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/21.jpg)
Recommended Down-reg
Using our recommendations, tests of down-regulation generally don’t find TP, as expected
![Page 22: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/22.jpg)
Analysis decisions to make
Summarisation methodDE methodDirection of DE (recommend up)Choice of true negatives (equal only)Choice of true positives (all DE)Post-summarisation normalisation (loess using equal only)Type of ROC chart (standard ROC)Proportion of x-axis to display (all)
![Page 23: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/23.jpg)
AffyDEComp - charts
![Page 24: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/24.jpg)
AffyDEComp - comparison
![Page 25: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/25.jpg)
AUCs - recommended choices
![Page 26: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/26.jpg)
Conclusions
First step towards a reliable benchmark for DEGolden Spike data has some value if use of empty probesets is revisitedCertain combinations of summarisation/DE methods seem poor
Keep it open (Bioconductor) - because science should be reproducible!
![Page 27: AffyDEComp: towards a benchmark for differential expression methods](https://reader035.fdocuments.in/reader035/viewer/2022062519/56814f29550346895dbcb599/html5/thumbnails/27.jpg)
Recommendations
Create a new spike-in data set whereSpike-in concentrations are realistic
DE spike-ins both up- and down-regulated
Concentrations and FC not confounded
Larger number of arrays
Benchmarks using regulatory information
Benchmarks for Illumina data
Benchmarks for SNP chips (GWA studies)
manchester.ac.uk/bioinformatics/affydecomp