1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of...

24
1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley

Transcript of 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of...

Page 1: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

1

Needles in Haystacks:Are There Any?

How Many Are There?Where Are They?

John Rice

University of California, Berkeley

Page 2: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

2

Outline

• Classical testing: significance levels, p-values, power

• Testing many hypotheses: issues and recent developments (false discovery rate)

• “Higher Criticism: are any null hypotheses false?

• Motivation: the Taiwanese American Occultation Survey

• Estimating the proportion of false nulls

Page 3: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

3

Classical TestingH0: null hypothesis vs HA: alternative hypothesisT: test statistic

Reject H0 for large values of T, say T > t0 (threshold)

Type I error: reject H0 when it holds

Significance level = Prob(Type I error)=P(T>t0 |H0)

Fix and find t0 by considering only the null distribution of T

P-value: If observe T=t, P-value = Prob(T>t|H0). Under H0, with continuity, the distribution of P-value is uniform on [0,1]

Type II error: reject HA when it holdsPower: 1 - Prob(Type II error)

Page 4: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

4

Multiple Testing

Many null and alternative hypotheses; e.g. source detection --- each pixel is either background or source

Collection of test statistics and P-values, one for each hypothesis. May or may not be independent random variables

Possible questions: Are any null hypotheses false? How many are false? Which ones are false? Or probabilities of such.

Page 5: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

5

Analogues of Type I Error # not

reject ed # rejected

True Null U V m0

False Null T S m1

m-R R m = m0 + m1

Per-Comparison Error Rate (PCER): E(V/m)Ignores multiplicity and use significance level , e.g.

Per-Family Error Rate (PFER): E(V)

Family-Wise Error Rate (FWER): P(V>0)

Latter two can be controlled by Bonferroni, e.g. m

Page 6: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

6

Recent Analogues

# not reject ed

# rejected

True Null U V m0

False Null T S m1

m-R R m = m0 + m1

False Discovery Proportion: FDP = V/R

False Discovery Rate: FDR = E(FDP)

Positive FDR: p-FDR = E(FDP| R>0)

Exceedance Control: P(FDP > c)

The probability of at most k false rejections given at least k hypotheses are true: k-FWER

Page 7: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

7

False Discovery Rate

# not reject ed

# rejected

True Null U V m0

False Null T S m1

m-R R m = m0 + m1

Determination of FDR threshold for desired level = E(V/m)

Order P-values P1 < P2 < …< Pm

Find d = max { j: Pj < j/m}

Reject all hypotheses with Pk Pd

Quantity controlled by FRD can be more meaningful than that controlled by PCER which treats 10 false detections out of 20 detections the same as 10 out of 2000.

Page 8: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

8

FDR line t(p)= p/

Empirical distribution of P-values

Uniform distribution

Note that threshold is chosen adaptively, compared to threshold for PCER which controls E(V/m), by, say, a k threshold. For example, adapts to distribution of source intensity relative to background intensity

Page 9: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

9Hopkins et al.

Page 10: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

10

Higher Criticism

Are there any false nulls, any sources? Are there any needles in the haystack?

Test statistic is based on comparing the distribution of P-values to a uniform distribution -- are there too many small ones? Expect i P-values i/n

Donoho & Jin

Page 11: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

11

stre

ngth

sparsity

Consider a large number of tests for a rare but moderately strong signal. There are scenarios in which it can be determined that there are signals but not determine which tests correspond to signals. The smallest few P-values will not correspond to signals.

Page 12: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

12

Estimating the Proportion

Seemingly harder question: what is the proportion of needles in the haystack?

Motivation: The Taiwanese-American Occultation Survey (TAOS) will search for Kuiper Belt Objects (KBOs), by monitoring star fields for occultations.

Page 13: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

13

Occultations

Page 14: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

14Simulated occultation

Time series of flux

Occultation by an asteroid on two cameras

Thousands of stars will be simultaneously monitored every night, searching for rare events lasting about 1/5 second.

In the course of a year, will try to detect 10-1000(?) occultations among 1010-1012 measurements!

Page 15: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

15

Proposed Detection Scheme

Consider basing test on flux from a single hold. Consider a particular star

Initial data: fkh = flux from star on telescope k, hold h=1,…,n will be used for calibrating subsequent test statistics.

New observation to be tested for possible occultation: Yk

Rk = rank of Yk among the fkh

Test statistic: the product of the Rk

Page 16: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

16

Construction is based on the following fact: If Y1,…,Yn are iid and Y is independent of them with the same distribution, then

Thus, the null distribution of the product of the ranks can be calculated explicitly. Or an approximation to the log of the product can be made by treating the ranks as independent uniform random variables.

Page 17: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

17

Retrospective Estimation of Occultation Rate

Suppose have a year of data. What can we say about the occultation rate (and thus the abundance of KBOs)?

Note distinction between this question and identifying individual occultations in real time.

Page 18: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

18

The problem: •Given a very large number of independent hypothesis tests, where in the vast majority of cases the null hypothesis is true, estimate the proportion of false null hypotheses.

•The power of the test is unknown and varies from test to test.

•The distribution of the test statistic under the alternative is not known.

•We would like to be able to state at a specified level of confidence that there are at least a specified number of false null hypotheses.

Page 19: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

19

Suppose a proportion of the tests correspond to false null hypotheses. Then the distribution of the p-values is

Lower bound:

Empirical version of numerator:

Page 20: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

20

Motivation for construction: want to bound the contributions from the true nulls to

Suppose there exists such that

Since the proportion of p-values greater than can be attributed to false nulls.

Thus a (biased) estimate of the proportion of false nulls:

Page 21: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

21

Page 22: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

22

Lower confidence bound:

Thus can state, for example, that with 90% confidence there were at least 777 occultations.

Note that there is no meaningful upper bound, because occultations could be arbitrarily shallow.

Analysis shows that there are scenarios in which the proportion of false nulls can be consistently estimated but in which one cannot identify which nulls were false.

Page 23: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

23

Surprise! You would think that estimating the proportion of false nulls is harder than testing whether any nulls were false, but for the normal model presented earlier, when you can do one, you can do the other.

Cai, Jin & Low

Page 24: 1 Needles in Haystacks: Are There Any? How Many Are There? Where Are They? John Rice University of California, Berkeley.

24

References

Y. Benjamini and Y. Hochberg (1995). Controlling the false discovery rate. J. Royal Stat. Soc. B. 57, 289.

T. Cai, J. Jin, and M. Low (2005). Estimation and confidence sets for sparse normal mixtures. www.stat.purdue.edu/~jinj/Research/ESTEPS.pdf

D. Donoho & J. Jin (2004). Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics 32, 962

C. Genovese and L. Wasserman (2005). Exceedence control of the false discovery proportion. http://www.stat.cmu.edu/~genovese/papers/exceedance.pdf

A. Hopkins et al (2002). A new source detection algorithm using the false discovery rate. Astr. J. 123, 1086

N. Meinshausen and J. Rice (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Annals of Statistics 32(1), in press. Sofware(R ): cran.r-project.org/doc/packages/howmany.pdf

C. Miller et al (2001). Controlling the false discovery rate in astrophysical data analysis. Astr. J 122, 3492

J. Shaffer (2005). Recent developments towards optimality in multiple hypothesis testing. Contact [email protected]

J. Storey (2002). A direct approach to false discovery rates. J. Roy. Stat. Soc. B.64, 479

M. van der Laan, S. Dudoit, and K. Pollard (2004). Augmentation procedures for control of the generalized familywise error rate and tail probabilities for the proportion of false positives. Statistical Applications in Genetics and Molecular Biology 3, Article 15

There are many additional relevant references and the literature is rapidly evolving. Those given above are for starters and contain further references.