Adapted from: Wulff HR, Andersen B, Brandenhoff P, Guttler F (1987): What do doctors know about...

Adapted from: Wulff HR, Andersen B, Brandenhoff P, Guttler F (1987): What do doctors know about statistics? Statistics in Medicine 6:3-10

Suppose we conduct a t-test of the difference between two means and obtain a p-value < .05. Does this mean:

a) There is less than a 5% chance that the results are due to chance.

b) If there really is no difference between the population means, there is less than a 5% chance of obtaining a difference this large or larger.

c) There is a 95% chance that if the study is repeated, the result will be replicated.

d) There is a 95% chance that there is a real difference between the two population means.

What is a p-value?What is a p-value?

The probability of obtaining a test statistic (data) that departs as much as or more than the observed test statistic (data) if the null hypothesis were true.

Which Null Hypotheses are Which Null Hypotheses are Meaningful and Testable?Meaningful and Testable?

Those that precisely specify a probability model for the data.

A PerspectiveA Perspective

Samples

Populations

We study:

We wish to obtain knowledge about:

Data

Nature

Gene Family-Based Hypothesis Gene Family-Based Hypothesis TestingTesting

Sketch of Typical (outmoded and inappropriate) Approach:

1. For Genes 1 to K, define a vector, R, of length K that contains the values of a categorical variable denoting group membership.

2. For Genes 1 to K, define a vector, C, of length K that contains the values of a binary variable denoting whether or not the gene was ‘significant’ or ‘interesting’ by some standard.

3. Conduct some frequentist significance test for an association between R and C.

“Fortune cookie bet made Powerball lottery players rich” (from N. Y. Times, 2005)

110 players in March 30th drawing get 5/6 numbers right.

Odds of getting 5/6 numbers is ~ 1 in 3,000,000.

Expected only 4 or 5 second place winners.

Players used fortune cookies to obtain numbers. All cookies came from same factory.

Numbers selected by workers writing numbers on paper and putting in bowl for selection.

Same number combinations went out in thousands of cookies a day.

Assume Independence

Story raises important point of independence assumption in microarray analyses.

Majority of microarray statistical tests assume independence among genes.

However, we know that genes do not function independently of each other. Work in networks.

What are the implications of the assumption in our final results.

Important impact on final results when investigating the role of thousands of genes within a biological system.

The Independence Issue: A Real Example

Simulated P-value for 42 out of 42

-14

-12

-10

-8

-6

-4

-2

0

0 0.2 0.4 0.6 0.7 0.8

Gene Family-Based Hypothesis Gene Family-Based Hypothesis TestingTesting

Which Null Hypothesis is Being Tested?

1. None of the genes in family c are differentially expressed (associated, methylated, etc.).

2. The proportion of genes in family c that are differentially expressed is equal to the proportion of genes in the remainder of the genome that are differentially expressed (beware of ‘anti-Bayesian’ element).

3. The proportion of genes in family c that are differentially expressed to an extent greater than is equal to the proportion of genes in the remainder of the genome that are differentially expressed.

Note: These can all be subsumed under the general:

H0:

, ,C C

Union-Intersection• The compound

hypothesis is rejected if any one of the individual hypotheses are rejected

• Multiplicity adjustment procedure is required to control type I error rate

• The rejection region for this test is the union of rejection regions corresponding to the individual tests

Intersection-Union• The compound

hypothesis is rejected only if all of the individual hypotheses are rejected

• Overall type I error rate of α is maintained without multiplicity adjustment

• The rejection region for this test is the intersection of the rejection regions corresponding to the individual tests

Union-Intersection vs Intersection-Union Tests

Methods not yet well established. Bayesian methods involving posterior probabilities in place of p-values may be especially useful.

When P << N, methods are well established (e.g., multiple regression.

When P >> N optimal methods are not yet clear.

Normality? Exchangeability? Independence? Other?

What assumptions are being made?

•Non-Parametric: Non-Panacea (Cohen, J.)

•Asymptotic Exact

Major Issues to Ask About in Selecting a Major Issues to Ask About in Selecting a Method for Gene Family or Pathway TestingMethod for Gene Family or Pathway Testing

► What is the null?► Does the method assume that all

components (e.g., SNPs or gene expression levels) are independent?

► Is the method ‘anti-Bayesian’?► Does the method use the continuity of

information (not simply significant or not)?

Adapted from: Wulff HR, Andersen B, Brandenhoff P, Guttler F (1987): What do doctors know about...

Documents

Transcript of Adapted from: Wulff HR, Andersen B, Brandenhoff P, Guttler F (1987): What do doctors know about...