Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method...

46
Analysis pipe Analysis pipeline line Analysis pipe Analysis pipe line line Quality Quality control control Normalization Normalization Filtering Filtering Statistical Statistical Normalization Normalization Filtering Filtering analysis analysis Bi l i l Bi l i l Annotation Annotation Biological Biological Knowledge Knowledge extraction extraction

Transcript of Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method...

Page 1: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Analysis pipeAnalysis pipe‐‐linelineAnalysis pipeAnalysis pipe lineline

QualityQualitycontrolcontrol

NormalizationNormalization FilteringFiltering StatisticalStatisticalNormalizationNormalization FilteringFiltering analysisanalysis

Bi l i lBi l i lAnnotationAnnotation

Biological Biological KnowledgeKnowledgeextractionextraction

Page 2: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

FFilteringilteringFFilteringiltering

•• PerchèPerchè sisi filtranofiltrano ii datidati??–– Per Per ridurreridurre ilil numeronumero didi test test statisticistatistici cheche dovremodovremo fare!fare!

Page 3: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Multiple testing errorsMultiple testing errorsMultiple testing errorsMultiple testing errors

P f i l i l i i l fP f i l i l i i l f•• Performing multiple statistical tests two types of Performing multiple statistical tests two types of errors can occur:errors can occur:–– Type I error (False positive)Type I error (False positive)

–– Type II error (False negative)Type II error (False negative)

•• Reduction of type I errors increases the number of Reduction of type I errors increases the number of type II errors.type II errors.

•• It is important to identify an approach that reduces It is important to identify an approach that reduces false positivesfalse positives with the minimum loss of information with the minimum loss of information ((false negativefalse negative))

Page 4: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

The multiple tests problemThe multiple tests problem

•• If the number of samples increases the tails of a If the number of samples increases the tails of a distribution are getting more populated.distribution are getting more populated.

Page 5: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

FFilteringilteringFFilteringiltering

•• Filtering affects the false discovery rate .Filtering affects the false discovery rate .g yg y

•• Researcher is interested in keeping the number ofResearcher is interested in keeping the number ofResearcher is interested in keeping the number of Researcher is interested in keeping the number of tests/genes as low as possible while keeping the tests/genes as low as possible while keeping the interesting genes in the selected subset.interesting genes in the selected subset.g gg g

•• If the truly differentially expressed genes areIf the truly differentially expressed genes areIf the truly differentially expressed genes are If the truly differentially expressed genes are overrepresented among those selected in the overrepresented among those selected in the filtering step, filtering step, the FDR associated with a certain the FDR associated with a certain g p,g p,threshold of the test statistic will be lowered due to threshold of the test statistic will be lowered due to the filteringthe filtering..

Extracted from: Heydebreck et al. Bioconductor Project Working Papers 2004 

Page 6: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency
Page 7: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Filtering can be performed at various Filtering can be performed at various l ll llevels:levels:

•• Annotation features:Annotation features:Annotation features:Annotation features:–– Specific gene features (i.e. GO term, presence of Specific gene features (i.e. GO term, presence of t i ti l l ti l t i tt i ti l l ti l t i ttranscriptional regulative elements in promoters, transcriptional regulative elements in promoters, etc.)etc.)

•• Signal features:Signal features:–– % intensities greater of a user defined value% intensities greater of a user defined valuegg

–– InterquantileInterquantile range (IQR) greater of a defined valuerange (IQR) greater of a defined value

Page 8: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Intensity distributionsIntensity distributionsyyBg level probe setsBg level probe sets

RMA GCRMA

Page 9: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

How to define the efficacy of a filtering How to define the efficacy of a filtering procedure?procedure?

probesetsteringinAfterFilspike NNh − ×

100inspikeingfterFilterprobesetsA

probesetsteringinAfterFilspike

NNenrichment

×=100

•• This enrichment is very similar to that used to evaluate the purification foldsThis enrichment is very similar to that used to evaluate the purification foldsThis enrichment is very similar to that used to evaluate the purification folds This enrichment is very similar to that used to evaluate the purification folds of a protein after a chromatographic step.of a protein after a chromatographic step.

[ ][ ]mBeforeChroEAfterChromgP

mBeforeChrogPAfterChromEenrichment ..100××

μ[ ]mBeforeChroEAfterChromgP .. ×μ

Page 10: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Filtering by genefilter pOverAFiltering by genefilter pOverA(keep if(keep if ≥≥ 25% probe sets have intensities25% probe sets have intensities ≥ log≥ log22(100)(100)))(keep if (keep if ≥≥ 25% probe sets have intensities 25% probe sets have intensities ≥ log≥ log22(100)(100)))

5553 5553 42/42 SpikeIn42/42 SpikeIn

223002230042/42 SpikeIn42/42 SpikeIn

Enrichment: Enrichment: 401%401%

Enrichment: Enrichment: 100%100%

Page 11: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Filtering by InterQuantile RangeFiltering by InterQuantile RangeIQR25% 75%

Page 12: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

How filtering by genefilter IQR works?How filtering by genefilter IQR works?The distribution of all intensity values of a differential expression experiment are the The distribution of all intensity values of a differential expression experiment are the summary of the distribution of each gene expression over the experimental conditionssummary of the distribution of each gene expression over the experimental conditions

Page 13: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

How filtering by IQR works?How filtering by IQR works?

The filter removes genes that show little changes within the experimental pointsThe filter removes genes that show little changes within the experimental points

Page 14: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Filtering by genefilter IQRFiltering by genefilter IQR(removing if intensities IQR(removing if intensities IQR≤≤0.25, 0.50.25, 0.5))(removing if intensities IQR(removing if intensities IQR≤≤0.25, 0.50.25, 0.5))

68 68 42/42 SpikeIn42/42 SpikeIn

223002230042/42 SpikeIn42/42 SpikeIn

244 244 42/42 SpikeIn42/42 SpikeIn

Enrichment: Enrichment: 32794%32794%

Enrichment: Enrichment: 100%100%

Enrichment: Enrichment: 9139%9139%

Page 15: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Analysis pipeAnalysis pipe‐‐linelineAnalysis pipeAnalysis pipe lineline

QualityQualitycontrolcontrol

NormalizationNormalization FilteringFiltering StatisticalStatisticalNormalizationNormalization FilteringFiltering analysisanalysis

Bi l i lBi l i lAnnotationAnnotation

Biological Biological KnowledgeKnowledgeextractionextraction

Page 16: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Statistical analysisStatistical analysisStatistical analysisStatistical analysis

Th iti it f t ti ti l t t i ff t d b thTh iti it f t ti ti l t t i ff t d b th•• The sensitivity of statistical tests is affected by the The sensitivity of statistical tests is affected by the number of available replicates.number of available replicates.

•• Replicates can be:Replicates can be:•• Replicates can be:Replicates can be:–– TechnicalTechnical–– BiologicalBiologicalgg

•• Biological replicates better summarize the variability Biological replicates better summarize the variability of samples belonging to a common group.of samples belonging to a common group.

•• The minimum number of replicates is an important The minimum number of replicates is an important issue!issue!

Page 17: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

How much replicates are importantHow much replicates are important??Yang YH e Speed T, 2002

Page 18: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Sample sizeSample sizeSample sizeSample size

•• Microarray experiments are often performed with aMicroarray experiments are often performed with aMicroarray experiments are often performed with a Microarray experiments are often performed with a small number of biological replicates, resulting in low small number of biological replicates, resulting in low statistical power for detecting differentially expressedstatistical power for detecting differentially expressedstatistical power for detecting differentially expressed statistical power for detecting differentially expressed genes and concomitant high false positive rates. genes and concomitant high false positive rates. 

•• The issue of how many replicates are required in aThe issue of how many replicates are required in a•• The issue of how many replicates are required in a The issue of how many replicates are required in a typical experimental system needs to be addressed.typical experimental system needs to be addressed.

Of ti l i t t i th diff i i dOf ti l i t t i th diff i i d•• Of particular interest is the difference in required Of particular interest is the difference in required sample sizes for similar experiments in sample sizes for similar experiments in inbredinbred vs. vs. o tbredo tbred pop lations (e g mo se and rat s h man)pop lations (e g mo se and rat s h man)outbredoutbred populations (e.g. mouse and rat vs. human).populations (e.g. mouse and rat vs. human).

Page 19: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Assessing sample sizes inAssessing sample sizes inmicroarray experimentsmicroarray experiments

•• Assessment of sample sizes for microarray data is a Assessment of sample sizes for microarray data is a tricky exercise. tricky exercise. 

•• The reason why we are performing such analysis is to The reason why we are performing such analysis is to have a general feeling on the ability of our have a general feeling on the ability of our experimental data to robustly detect differential experimental data to robustly detect differential expression.expression.

Page 20: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

AssumptionsAssumptionsAssumptionsAssumptions

i i i• A microarray experiment is set up to compare gene expressions between one treatment group and one control group.

• Microarray data has been normalized and ytransformed so that the data for each gene is sufficiently close to a normal distribution that ya standard 2‐sample pooled‐variance t‐test will reliably detect differentially expressedwill reliably detect differentially expressed genes.

Page 21: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

• The tested hypothesis for each gene is:

versusversus

where μT and μC are means of gene expressions for treatment and control group respectively.g p p y

• The analysis is done using the common variance described in:variance described in: – Wei et al. BMC Genomics. 2004, 5:87

Page 22: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

LogLog22(T/C) is frequently used to evaluate fold (T/C) is frequently used to evaluate fold change variationchange variation

8 00

200, 400, 800, 1600, 32000100 100 100 100 100 200 400 800 1600 32000

100, 100, 100, 100, 100

4 00

6.00

8.00

log2(t/c)

t/c down-regulation

0.00

2.00

4.00compression

‐4.00

‐2.00

0.00

log2(t/c)

H07498

U6539

G7599

L8754

AA238

8345

0987

654

765

439

A

S09

MN

AC8

76 PT7

F654

Page 23: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Fold change filteringFold change filteringFold change filteringFold change filtering

•• The intensity change between experimental groups The intensity change between experimental groups (i.e. control versus treated) are known as:(i.e. control versus treated) are known as:

ld hld h–– Fold changeFold change..

• Frequently an arbitrary threshold

1log 2 =Ct lTrtd

is used to define a significant differential expression.

Ctrl

g p

Page 24: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Statistical analysisStatistical analysisStatistical analysisStatistical analysis•• Intensity changes betweenIntensity changes between•• Intensity changes between Intensity changes between 

experimental groups (i.e. experimental groups (i.e. control versus treated) are control versus treated) are known as:known as:–– Fold change. Fold change. –– Ranking genes based on fold Ranking genes based on fold 

change alone implicitly change alone implicitly g p yg p yassigns equal variance to assigns equal variance to every gene.every gene.

•• Fold change alone is not Fold change alone is not ffi i i di hffi i i di hsufficient to indicate the sufficient to indicate the 

significance of the expression significance of the expression changes.changes.

•• Fold change has to beFold change has to be•• Fold change has to be Fold change has to be supported by statistical supported by statistical information. information. 

Page 25: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency
Page 26: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

StatisticalStatistical filteringfilteringStatistical Statistical filteringfiltering

S i i lS i i l fil ifil i b f d ib f d i•• Statistical Statistical filtering filtering can be performed using can be performed using parametric and nonparametric and non‐‐parametric tests.parametric tests.P iP i•• Parametric tests:Parametric tests:–– The populations under analysis are normally distributed.The populations under analysis are normally distributed.

•• Non parametric tests:Non parametric tests:–– There is no assumption on samples distribution.There is no assumption on samples distribution.

•• Non parametric are less sensitive than parametric.Non parametric are less sensitive than parametric.

Page 27: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Selecting differentially expressed genesSelecting differentially expressed genesSelecting differentially expressed genesSelecting differentially expressed genes

Statistical validationmethod I

Statistical validationmethod IImethod II

Differential expressionlinked to a specific

biological event.biological event.

Statistical validationmethod III

Page 28: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Selecting differentially expressed genesSelecting differentially expressed genesSelecting differentially expressed genesSelecting differentially expressed genes

•• Each method grasps some true signals but not Each method grasps some true signals but not llllall.all.

•• Each method catches some false signals.Each method catches some false signals.gg

•• The trick is to find the best condition to The trick is to find the best condition to maximi e true signals while minimi ing fakesmaximi e true signals while minimi ing fakesmaximize true signals while minimizing fakes.maximize true signals while minimizing fakes.

Page 29: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Mean y Mean y

Population Ctrl

Mean y1 Mean y2

Population Trtd

Sample mean “s”

Less than a 5% chance that the sample with mean s came from population y1, i.e., s is significantly different from “mean y1” at the p < 0.05 significance level. But we cannot reject the hypothesis that the sample came from population y2.

Page 30: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

t‐statistics

where

using the pooled variance

In the case of unequal varianceIn the case of unequal variance

Welch‐statistics

with the unpooled( d) t d d(sqared) standard error

Page 31: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

•• TT‐‐statistics is widespread in assessing statistics is widespread in assessing differential expression.differential expression.

•• Unstable variance estimates that arise whenUnstable variance estimates that arise when•• Unstable variance estimates that arise when Unstable variance estimates that arise when sample size is small can be corrected using:sample size is small can be corrected using:–– Bayesian methods (Bayesian methods (LimmaLimma) ) 

–– Error Error fudge factors (SAM)fudge factors (SAM)

Page 32: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Bayesian regularized tBayesian regularized t‐‐testtest(Baldi & Long 2001)(Baldi & Long 2001)(Baldi & Long 2001)(Baldi & Long 2001)

The method tries to decouple the meanThe method tries to decouple the mean––variance dependency variance dependency by modeling the variance of the expression of a gene as a by modeling the variance of the expression of a gene as a y g p gy g p g

function of the mean expression of the genefunction of the mean expression of the gene

My gene

{{

wherewhere

Page 33: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Bayesian regularized tBayesian regularized t‐‐testtest

The main goal of this approach is to stabilize the The main goal of this approach is to stabilize the variance estimates that arise when sample size is small, variance estimates that arise when sample size is small,

to make more robust the tto make more robust the t--test resultstest results

Page 34: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Bayesian regularized tBayesian regularized t‐‐testtest

The regularized tThe regularized t--test makes more evident the test makes more evident the presence of significant differential expressionspresence of significant differential expressions

Page 35: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

Type I error correctionType I error correctionType I error correctionType I error correction

•• Null hypothesis (H0): Null hypothesis (H0): the mean of treated and the mean the mean of treated and the mean of control for a geneof control for a gene ii belong to the same distributionbelong to the same distributionof control for a gene of control for a gene ii belong to the same distribution.belong to the same distribution.

•• Type I errorType I error: H0 is false.: H0 is false.

•• Sidak significance point:Sidak significance point: ggK αα −−= 11),(

•• If the pIf the p‐‐values are lower of K (gvalues are lower of K (g αα) all the remaining H0) all the remaining H0

g ),(αα= acceptance level (es 0.05)= acceptance level (es 0.05)gg= n. of independent tests= n. of independent tests

If the pIf the p‐‐values are lower of K (g,values are lower of K (g,αα) all the remaining H0 ) all the remaining H0 are considered true.are considered true.

Page 36: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

( )( )Type I error correction (FWER)Type I error correction (FWER)

ggK αα −−= 11),(gK αα 11),(

P of diff. exprs. genes P of diff. exprs. genes αα’’<10<10--66 1 1 –– (1 (1 –– 0.05)0.05)1/51/5== 0.1020.102< 10< 10--66 1 1 –– (1 (1 –– 0.05)0.05)1/41/4== 0.01270.01272* 102* 10--55 1 1 –– (1 (1 –– 0.05)0.05)1/31/3== 0.01700.01700 0470 047 11 (1(1 0 05)0 05)1/21/2 0 02530 02530.0470.047 1 1 –– (1 (1 –– 0.05)0.05)1/21/2== 0.02530.0253……

Page 37: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

BH correctionBH correctionBH correctionBH correction

•• BH is the most used method for the correction of BH is the most used method for the correction of type I errors in microarray analysis.type I errors in microarray analysis.

The application of BH correction•• However, it has some limitation due to the initial However, it has some limitation due to the initial hypotheses:hypotheses:

The application of BH correctionto these pvalues will not produceany differential expressed genes!

–– The gene expressions are independent from each other.The gene expressions are independent from each other.

–– The raw distribution of p values should be uniform in the The raw distribution of p values should be uniform in the ppnon significant range.non significant range.

Page 38: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

SAM SAM (Significance analysis of microarrays)(Significance analysis of microarrays)(Tusher et al. 2001)(Tusher et al. 2001)( )( )

fudge factor regularizes fudge factor regularizes the the t t --statistic statistic by inflating theby inflating theby inflating theby inflating thedenominatordenominator

s(i) is the pooled standard deviation, taking into account differinggene-specific variation across arrays.

Page 39: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

•• SAM uses data permutations to define a setSAM uses data permutations to define a setSAM uses data permutations to define a set SAM uses data permutations to define a set of significant differential expression.of significant differential expression.

N N N

T T T

N

N

N

T

T

T N

N NT

T T N

N

N

T

T

T N

N NT

T T{ }T T T NT T N T T N NT N NT

{ }

Page 40: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

FDR is given by p0 * False / Calledp0 is the prior probability pi0 that a gene is not differentially expressed

Page 41: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

How SAM calculates the False Discovery Rate for a How SAM calculates the False Discovery Rate for a 

ifi d l ?ifi d l ?specific delta?specific delta?

Permutations1234720

Mean falseMean false

Page 42: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

RankRank ProductProductRankRank ProductProduct

k d i i i i h• Rank Product is a non‐parametric statistic that detects items that are consistently highly ranked i b f li t f l th tin a number of lists, for example genes that are consistently found among the most strongly upregulated genes in a number of replicateupregulated genes in a number of replicate experiments.I i b d h i h d h ll• It is based on the assumption that under the null hypothesis that the order of all items is random th b bilit f fi di ifi itthe probability of finding a specific item among the top r of n items in a list is p = r/n.

Page 43: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

RankRank ProductProductRankRank ProductProduct

M lti l i th b biliti l d t th• Multiplying these probabilities leads to the definition of the rank product:

∏= i

nrRP

where ri is the rank of the item in the i‐th list   and i h l b f i i h i h li

in

ni is the total number of items in the i‐th list.

Th ll th RP l th ll th• The smaller the RP value, the smaller the probability that the observed placement of the item at the top of the lists is due to chanceitem at the top of the lists is due to chance.

Page 44: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

RankRank ProductProductRankRank ProductProduct

∏= gg

rRP ∏

gg n

Page 45: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency

RankRank ProductProduct

1 )|(|1 *)( gm

lg

l gg RPPRI

GLP =≤= ∑∑

∑∑ ≤l RPPRI )|(|1 *

∑∑∑

≤=

gl

gl g

g RPRPI

RPPRIL

FDR)|(|

)|(| )(

∑ ≤g

ggg RPRPI )|(|

Page 46: Analysis pipe‐line - unito.it pipe‐line Quality control Normalization Filtering ... The method tries to decouple the meanThe method tries to decouple the mean––variance dependency