TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility...
Transcript of TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility...
![Page 1: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/1.jpg)
TODAY: Safe Testing
1. Reproducibility Crisis/Problems with p-values
2. The S-Value
3. Optional Continuation
4. Optional Stopping vs Optional Continuation
5. Gambling Interpretation, Again
6. Types of S-Values / GROW S-Values
[Next Week No Lecture โ Homework due Mo May 11]
![Page 2: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/2.jpg)
Safe Testing
Peter Grรผnwald
Centrum Wiskunde & Informatica โ Amsterdam
Mathematical Institute โ Leiden University
with Rianne de Heide,
Wouter Koolen, Judith
ter Schure, Alexander
Ly, Rosanne Turner
![Page 3: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/3.jpg)
Slate Sep 10th 2016: yet another classic finding in
psychologyโthat you can smile your way to
happinessโjust blew upโฆ
Reproducibility Crisis
Cover Story of
Economist (2013),
Science (2014)
![Page 4: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/4.jpg)
Reasons for Reproducibility Crisis
1. Publication Bias
2. Problems with Hypothesis Testing Methodology
![Page 5: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/5.jpg)
![Page 6: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/6.jpg)
![Page 7: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/7.jpg)
Xkcd.org
![Page 8: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/8.jpg)
Reasons for Reproducibility Crisis
1. Publication Bias
2. Problems with Hypothesis Testing Methodology
![Page 9: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/9.jpg)
Replication Crisis in
Science somehow related to use of p-values and
significance testingโฆ
![Page 10: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/10.jpg)
Replication Crisis in
Science somehow related to use of p-values and
significance testingโฆ
![Page 11: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/11.jpg)
Replication Crisis in
Science somehow related to use of p-values and
significance testingโฆ
![Page 12: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/12.jpg)
Replication Crisis in
Science somehow related to use of p-values and
significance testingโฆ
![Page 13: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/13.jpg)
Replication Crisis in
Science somehow related to use of p-values and
significance testingโฆ
![Page 14: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/14.jpg)
โข Suppose reseach group A tests medication, gets
โalmost significantโ result.
โข ...whence group B tries again on new data. How to
combine their test results?
โข Standard Method 1: sweep data together, recompute p-
value. This is not correct; type-I error guarantee does not
hold any more
โข Standard method 2: use Fisherโs method for
combining p-values. Again not correct, since tests
cannot be viewed as independent
โข Standard method 3: multiply p-values. Just plain
wrong โ a mortal sin!
โข With the type of โp-valueโ introduced here, despite
dependence, evidences can still be safely multiplied
P-value Problem:
Combining Dependent Tests
![Page 15: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/15.jpg)
โข Suppose reseach group A tests medication, gets
โalmost significantโ result.
โข Sometimes group A canโt resist to test a few
more subjects themselves...
โข A recent survey revealed that 55% of psychologists have
succumbed to this practice (and then treat data as if large
sample size was determined in advance)
โข But isnโt this just cheating?
โข Not clear: what if you submit a paper and the referee
asks you to test a couple more subjects? Should you
refuse because it invalidates your p-values!?
P-value Problem (b):
Extending Your Test
![Page 16: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/16.jpg)
S is the new P
โข We propose a generic replacement of
the ๐-value that we call the ๐-value
โข ๐-values handle optional continuation
(to the next test (and the next, and ..))
without any problems
(can simply multiply S-values of
individual tests, despite dependencies)
![Page 17: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/17.jpg)
S is the new P
โข We propose a generic replacement of
the ๐-value that we call the ๐-value
โข ๐-values handle optional continuation
(to the next test (and the next, and ..))
without any problems
(can simply multiply S-values of
individual tests, despite dependencies)
S-values have Fisherian, Neymanian and Bayes-
Jeffreysโ aspects to them, all at the same time
Cf. J. Berger (2003, IMS Medaillion Lecture): Could
Neyman, Fisher and Jeffreys have agreed on testing?
![Page 18: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/18.jpg)
S-Values: General Definition
โข Let ๐ป0 = ๐๐ ๐ โ ฮ0} represent the null hypothesis
โข Assume data ๐1, ๐2, โฆ are i.i.d. under all ๐ โ ๐ป0 .
โข Let ๐ป1= ๐๐ ๐ โ ฮ1} represent alternative hypothesis
โข An S-value for sample size ๐ is a function
such that for all ๐0 โ ๐ป0 , we have
![Page 19: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/19.jpg)
First Interpretation: p-values
โข Proposition: Let S be an S-value. Then ๐โ1 ๐๐ is a
conservative p-value, i.e. p-value with wiggle room:
โข for all ๐ โ ๐ป0, all 0 โค ๐ผ โค 1 ,
โข Proof: just Markovโs inequality!
![Page 20: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/20.jpg)
Safe Tests
โข The Safe Test against ๐ป0 at level ๐ผ based on S-
value S is defined as the test which rejects ๐ป0 if
S ๐๐ โฅ1
๐ผ
โข Since ๐โ1 is a conservative ๐-valuue...
โข ....the safe test which rejects ๐ป0 iff ๐(๐๐) โฅ 20 , i.e.
๐โ1 ๐๐ โค 0.05 , has Type-I Error Bound of 0.05
![Page 21: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/21.jpg)
Safe Testing and Bayes
โข Bayes factor hypothesis testing
with ๐ป0 = ๐๐ ๐ โ ฮ0} vs ๐ป1 = ๐๐ ๐ โ ฮ1} :
Evidence in favour of ๐ป1 measured by
where
(Jeffreys โ39)
![Page 22: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/22.jpg)
Safe Testing and Bayes, simple ๐ฏ๐
Bayes factor hypothesis testing
between ๐ป0 = { ๐0} and ๐ป1 = ๐๐ ๐ โ ฮ1} :
Bayes factor of form
Note that (no matter what prior ๐1 we chose)
![Page 23: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/23.jpg)
Safe Testing and Bayes, simple ๐ฏ๐
Bayes factor hypothesis testing
between ๐ป0 = { ๐0} and ๐ป1 = ๐๐ ๐ โ ฮ1} :
Bayes factor of form
Note that (no matter what prior ๐1 we chose)
The Bayes Factor for Simple ๐ฏ๐
is an S-value!
![Page 24: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/24.jpg)
Default S-Value โ Neyman
1. ๐ป0 and ๐ป1 are point hypotheses โ then default S-
value is:
... the safe test based on ๐ looks a bit like, but is not a
standard Neyman-Pearson test.
Safe Test: reject if ๐ ๐๐ โฅ 1/๐ผ
NP: reject if ๐ ๐๐ โฅ 1/๐ต with ๐ต s.t. ๐0 ๐ ๐๐ โฅ ๐ต = ๐ผ
more conservative
![Page 25: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/25.jpg)
Safe Tests are Safe
under optional continuation
โข Suppose we observe data (๐1, ๐1), ๐2, ๐2 , โฆ
โข ๐๐: side information
...coming in batches of size ๐1, ๐2, โฆ , ๐๐. Let
โข We first evaluate some S-value ๐1 on (๐1, โฆ , ๐๐1).
โข If outcome is in certain range (e.g. promising but not
conclusive) and ๐๐1has certain values (e.g. โboss has
money to collect more dataโ) then....
we evaluate some S-value ๐2 on ๐๐1+1, โฆ , ๐๐2 ,
otherwise we stop.
![Page 26: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/26.jpg)
Safe Tests are Safe
โข We first evaluate ๐1.
โข If outcome is in certain range and ๐๐1 has certain
values then we evaluate ๐2 ; otherwise we stop.
โข If outcome of ๐2 is in certain range and ๐๐2 has
certain values then we compute ๐3 , else we stop.
โข ...and so on
โข ...when we finally stop, after say ๐พ data batches, we
report as final result the product
โข First Result, Informally: any ๐บ composed of S-
values in this manner is itself an S-value,
irrespective of the stop/continue rule used!
![Page 27: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/27.jpg)
Safe Tests are Safe
โข ๐๐ may be same function as ๐๐โ1, e.g. (simple ๐ป0)
โข But choice of ๐th S-value ๐๐ may also depend on
previous ๐๐๐ , ๐๐๐ , e.g.
and then (full compatibility with Bayesian updating)
![Page 28: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/28.jpg)
Safe Tests are Safe
Let ๐1 be S-value on . For ๐ = 1,2,โฆ , let
be any collection of S-values defined on
Let be arbitrary
stop/continue strategy, and:
Define if
Define if
else
else
and so on...
Define if
![Page 29: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/29.jpg)
Safe Tests are Safe
Theorem:
Suppose that for all ๐, ๐๐ โฅ ๐๐โ1, ๐๐โ1 . Then
๐, the end-product of all employed S-values
๐1, ๐๐1 , ๐๐2, โฆ is itself an S-value
โข Technically, the process is a
nonnegative supermartingale (Ville โ39) and the theorem is
proved using Doobโs optional stopping theorem
![Page 30: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/30.jpg)
Safe Tests are Safe
Theorem:
๐ , the end-product of all employed S-values
๐1, ๐๐1 , ๐๐2, โฆ is itself an S-value
Corollary: Type-I Error Guarantee Preserved
under Optional Continuation
Suppose we combine S-values with arbitrary
stop/continue strategy and reject ๐ป0 when final ๐ has
๐โ1 โค 0.05 . Then resulting test is a safe test and our
Type-I Error is guaranteed to be below 0.05!
![Page 31: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/31.jpg)
Safe Tests are Safe
Theorem:
๐ , the end-product of all employed S-values
๐1, ๐๐1 , ๐๐2, โฆ is itself an S-value
Corollary: Type-I Error Guarantee Preserved
under Optional Continuation
Suppose we combine S-values with arbitrary
stop/continue strategy and reject ๐ป0 when final ๐ has
๐โ1 โค 0.05 . Then resulting test is a safe test and our
Type-I Error is guaranteed to be below 0.05!
![Page 32: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/32.jpg)
Generalizing the Result
Theorem says:
โข Let where
s.t.
, is S-value:
โข Suppose that for all ๐, ๐๐ โฅ ๐๐โ1, ๐๐โ1 .
โข Let ๐ be smallest ๐ such that ๐๐ ๐๐๐ = ๐ ๐ก๐๐
โข Then is an S-value
![Page 33: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/33.jpg)
Generalizing the Result
Theorem says:
โข Let where
For all ๐, let and let โฆ
, is S-value:
โข s.t.
โข Suppose that for all ๐, ๐๐ โฅ ๐๐โ1, ๐๐โ1 .
โข Let ๐ be smallest ๐ such that ๐๐ ๐๐๐ = ๐ ๐ก๐๐
โข Then is an S-value
![Page 34: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/34.jpg)
Generalizing the Result
Theorem says: for all ๐,
โข Let ๐โจ๐โฉ = (๐๐๐โ1 +1, โฆ , ๐๐๐) ; ๐โจ๐โฉ = (๐1, โฆ , ๐๐๐
);
similarly for ๐, ๐
โข Let be a function such that
โข Let ๐ be a stopping time for the process ๐ 1 , ๐ 2 , โฆ
โข Then is an S-value
![Page 35: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/35.jpg)
Optional Stopping vs
Continuation
โข Let ๐โจ๐โฉ = (๐๐๐โ1 +1, โฆ , ๐๐๐) ; ๐โจ๐โฉ = (๐1, โฆ , ๐๐๐
)
โข Let be a function such that
โข Let ๐ be a stopping time for the process ๐ 1 , ๐ 2 , โฆ
โข Then is an S-value
Optional Continuation: we have batches of data
๐โจ1โฉ, ๐โจ2โฉ, โฆ and we can do optional stopping at the batch-
level (and obtain an S-Value and preserve Type I error
guarantees)
Traditional Optional Stopping: we take ๐โจ๐ โฉ = ๐๐ for all ๐.
![Page 36: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/36.jpg)
Optional Stopping vs
Continuation
โข In many but certainly not all cases, we can also do
optional stopping based on S-values.
โข Suppose that ๐ป0 = {๐0}, , no ๐๐ โs
โข Then and we can do optional
stopping at each ๐ and not just optional continuation
between โblocksโ
โข โฆbut if ๐ป0 composite then sometimes the only
conditional S-value satisfying
is given by the trivial S-value ๐๐+1 = 1 . Then OS
impossible (but OC with batches of size ๐๐ โซ 1 still
possible)
![Page 37: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/37.jpg)
(Again:) Safe Testing = Gambling!
โข At time 1 you can buy ticket 1 for 1$. It pays off
๐1(๐1, โฆ , ๐๐1) $ after ๐1 steps
โข At time 2 you can buy ticket 2 for 1$. It pays off
๐2(๐๐1+1, โฆ , ๐๐2) $ after ๐2 further steps.... and so on.
You may buy multiple and fractional nrs of tickets.
Kelly (1956)
![Page 38: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/38.jpg)
Safe Testing = Gambling!
โข At time 1 you can buy ticket 1 for 1$. It pays off
๐1(๐1, โฆ , ๐๐1) $ after ๐1 steps
โข At time 2 you can buy ticket 2 for 1$. It pays off
๐2(๐๐1+1, โฆ , ๐๐2) $ after ๐2 further steps.... and so on.
You may buy multiple and fractional nrs of tickets.
โข You start by investing 1$ in ticket 1.
![Page 39: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/39.jpg)
Safe Testing = Gambling!
โข At time 1 you can buy ticket 1 for 1$. It pays off
๐1(๐1, โฆ , ๐๐1) $ after ๐1 steps
โข At time 2 you can buy ticket 2 for 1$. It pays off
๐2(๐๐1+1, โฆ , ๐๐2) $ after ๐2 further steps.... and so on.
You may buy multiple and fractional nrs of tickets.
โข You start by investing 1$ in ticket 1.
โข After ๐1 outcomes you either stop with end capital ๐1or you continue and buy ๐1 tickets of type 2.
![Page 40: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/40.jpg)
Safe Testing = Gambling!
โข At time 1 you can buy ticket 1 for 1$. It pays off
๐1(๐1, โฆ , ๐๐1) $ after ๐1 steps
โข At time 2 you can buy ticket 2 for 1$. It pays off
๐2(๐๐1+1, โฆ , ๐๐2) $ after ๐2 further steps.... and so on.
You may buy multiple and fractional nrs of tickets.
โข You start by investing 1$ in ticket 1.
โข After ๐1 outcomes you either stop with end capital ๐1or you continue and buy ๐1 tickets of type 2. After ๐2 =๐1 + ๐2 outcomes you stop with end capital ๐1 โ ๐2 or
you continue and buy ๐1 โ ๐2 tickets of type 3, and so
on..
![Page 41: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/41.jpg)
Safe Testing = Gambling!
โข You start by investing 1$ in ticket 1.
โข After ๐1 outcomes you either stop with end capital ๐1or you continue and buy ๐1 tickets of type 2. After
๐2 = ๐1 + ๐2 outcomes you stop with end capital ๐1 โ ๐2 or you continue and buy ๐1 โ ๐2 tickets of type 3,
and so on...
โข ๐บ is simply your end capital
โข Your donโt expect to gain money, no matter what the
stop/continuation rule since none of individual
gambles ๐บ๐ are strictly favorable to you
![Page 42: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/42.jpg)
Safe Testing = Gambling!
โข You start by investing 1$ in ticket 1.
โข After ๐1 outcomes you either stop with end capital ๐1or you continue and buy ๐1 tickets of type 2. After
๐2 = ๐1 + ๐2 outcomes you stop with end capital ๐1 โ ๐2 or you continue and buy ๐1 โ ๐2 tickets of type 3,
and so on...
โข ๐บ is simply your end capital
โข Your donโt expect to gain money, no matter what the
stop/continuation rule since none of individual
gambles ๐บ๐ are strictly favorable to you
โข Hence a large value of ๐บ indicates that something
very unlikely has happened under ๐ป0 ...
![Page 43: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/43.jpg)
Default S-Value โ Neyman
1. ๐ป0 and ๐ป1 are point hypotheses โ then default S-
value is:
... the safe test based on ๐ looks a bit like, but is not a
standard Neyman-Pearson test.
Safe Test: reject if ๐ ๐๐ โฅ 1/๐ผ
NP: reject if ๐ ๐๐ โฅ 1/๐ต with ๐ต s.t. ๐0 ๐ ๐๐ โฅ ๐ต = ๐ผ
more conservative
![Page 44: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/44.jpg)
SafeTests & Neyman-Pearson, again
โข Let ๐ be a strict ๐-value: for all ๐ โ ๐ป0, ๐ ๐ โค ๐ผ = ๐ผ
โข Let ๐ =1
๐ผif ๐ โค ๐ผ , and ๐ =0 otherwise
โข Then for all ๐ โ ๐ป0,
...so ๐ is an S-value, and obviously, the safe test based
on ๐ rejects iff ๐ โค ๐ผ. It thus implements the Neyman-
Pearson test at significance level ๐ผ.
![Page 45: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/45.jpg)
โข Let ๐ be a strict ๐-value: for all ๐ โ ๐ป0, ๐ ๐ โค ๐ผ = ๐ผ
โข Let ๐ =1
๐ผif ๐ โค ๐ผ , and ๐ =0 otherwise
โข Then for all ๐ โ ๐ป0,
...so ๐ is an S-value, and obviously, the safe test based
on ๐ rejects iff ๐ โค ๐ผ. t thus implements the Neyman-
Pearson test at significance level ๐ผ.
...but it is a very silly S-value to use! With
probability ๐ถ, you loose all your capital, and you will
never make up for that in the future!
SafeTests & Neyman-Pearson, again
![Page 46: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/46.jpg)
Safe Tests and Neyman-Pearson,
again
โข The Safe Test based on an S-Value that is a
likelihood ratio is not a Neyman-Pearson test (it is
more conservative)
โข Neyman-Pearson tests (that only report โrejectโ
and โacceptโ, and not the p-value) are (other) Safe
Tests, but useless ones corresponding to
irresponsible gambling...
![Page 47: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/47.jpg)
Some S-Values are
Better than Others
โข The Trivial S-Value ๐ = 1 is valid, but useless
โข The Neyman-Pearson S-value is valid, but extremely
dangerous to use!
โข We need some idea of โoptimal S-valueโ
![Page 48: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/48.jpg)
How to design S-Values?
โข Suppose we are willing to admit that weโll only be
able to tell ๐ป0 and ๐ป1 apart if ๐ โ ๐ป0 โช ๐ป1โฒ for some
๐ป1โฒ โ ๐ป1 that excludes points that are โtoo closeโ to ๐ป0
e.g.
โข We can then look for the GROW (growth-optimal in
worst-case) S-value achieving
![Page 49: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/49.jpg)
GROW: an analogue of Power
โข The GROW (growth-optimal in worst-case) S-value
relative to ๐ป1,๐ฟ is the S-value achieving
where the supremum is over all ๐-values relative to ๐ป0โข ...so we donโt expect to gain anything when investing
in ๐ under ๐ป0โข ...but among all such ๐ we pick the one(s) that make
us rich fastest if we keep reinvesting in new gambles
under ๐ป1
![Page 50: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/50.jpg)
Main Theorem
(will be made precise in 3 weeks)
โข Under โhardly any conditionsโ on ๐ป0 and ๐ป1 a GROW
S-value exists! [G., De Heide, Koolen, 2019]
![Page 51: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/51.jpg)
โข Jerzy Neyman: alternative exists, โinductive . .
... behaviourโ, โsignificance levelโ and power
Sir Ronald Fisher: test statistic rather than
alternative, p-value indicates โunlikelinessโ
โข Sir Harold Jeffreys: Bayesian, alternative exists,
absolutely no p-values
J. Berger (2003, IMS Medaillion Lecture): Could
Neyman, Fisher and Jeffreys have agreed on testing?
... Using S-Values we can unify/correct the central
ideas
Three Philosophies of Testing
![Page 52: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/52.jpg)
โFisherianโ Example
2. Ryabko & Monarevโs (2005)
Compression-based randomness test
R&M checked whether sequences generated by
famous random number generators can be
compressed by standard data compressors such as
gzip and rar
Answer: yes! 200 bits compression for file of 10
megabytes
![Page 53: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/53.jpg)
Additional Background Slides
![Page 54: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/54.jpg)
Null Hypothesis Testing
โข Let ๐ป0 = ๐๐ ๐ โ ฮ0} represent the null hypothesis
โข For simplicity, today we assume data ๐1, ๐2, โฆ are
i.i.d. under all ๐ โ ๐ป0 .
โข Let ๐ป1= ๐๐ ๐ โ ฮ1} represent alternative hypothesis
โข Example: testing whether a coin is fair
Under ๐๐ , data are i.i.d. Bernoulli ๐
ฮ0 =1
2, ฮ1 = 0,1 โ
1
2
Standard test would measure frequency of 1s
![Page 55: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/55.jpg)
Null Hypothesis Testing
โข Let ๐ป0 = ๐๐ ๐ โ ฮ0} represent the null hypothesis
โข Let ๐ป1= ๐๐ ๐ โ ฮ1} represent alternative hypothesis
โข Example: testing whether a coin is fair
Under ๐๐ , data are i.i.d. Bernoulli ๐
ฮ0 =1
2, ฮ1 = 0,1 โ
1
2
Standard test would measure frequency of 1s
Simple ๐ป0
![Page 56: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/56.jpg)
Null Hypothesis Testing
โข Let ๐ป0 = ๐๐ ๐ โ ฮ0} represent the null hypothesis
โข Let ๐ป1= ๐๐ ๐ โ ฮ1} represent alternative hypothesis
โข Example: t-test (most used test world-wide)
๐ป0: ๐๐ โผ๐.๐.๐. ๐ 0, ๐2 vs.
๐ป1 : ๐๐ โผ๐.๐.๐. ๐ ๐, ๐2 for some ๐ โ 0
๐2 unknown (โnuisanceโ) parameter
๐ป0 = ๐๐ ๐ โ 0,โ }
๐ป1 = ๐๐,๐ ๐ โ 0,โ , ๐ โ โ โ 0 }
![Page 57: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/57.jpg)
Null Hypothesis Testing
โข Let ๐ป0 = ๐๐ ๐ โ ฮ0} represent the null hypothesis
โข Let ๐ป1= ๐๐ ๐ โ ฮ1} represent alternative hypothesis
โข Example: t-test (most used test world-wide)
๐ป0: ๐๐ โผ๐.๐.๐. ๐ 0, ๐2 vs.
๐ป1 : ๐๐ โผ๐.๐.๐. ๐ ๐, ๐2 for some ๐ โ 0
๐2 unknown (โnuisanceโ) parameter
๐ป0 = ๐๐ ๐ โ 0,โ }
๐ป1 = ๐๐,๐ ๐ โ 0,โ , ๐ โ โ โ 0 }
Composite ๐ป0
![Page 58: TODAY: Safe Testingpdg/teaching/inflearn/slides6.pdfTODAY: Safe Testing 1. Reproducibility Crisis/Problems with p-values 2. The S-Value 3. Optional Continuation 4. Optional Stopping](https://reader036.fdocuments.in/reader036/viewer/2022071013/5fcb953eac84254d782ab2b5/html5/thumbnails/58.jpg)
Standard Method:
p-value, significance
โข Let ๐ป0 = ๐๐ ๐ โ ฮ0} represent the null hypothesis
โข A (โnonstrictโ) p-value is a random variable (!) such
that, for all ๐ โ ฮ0 ,