Coverage Intervals: Actual Probability Name: Example Treibergs ... -...

Math 3070 § 1.Treibergs

Coverage Intervals: Actual ProbabilityBinomial Variable Lies Within CI.

Name: ExampleMay 28, 2011

One of the differences between the 7th and 8th editions of Devore’s book is the inclusion ofthe fascinating coverage interval plot in §7.2? It is part of the discussion of why the Agrest-Coullconfidence interval for the population proportion is better than the traditional interval.

Both are derived under the null hypothesis that the population proportion of successes p, issome fixed number. A random sample of size n is chosen. Let X be the number of successesin the sample, which has the binomial distribution X ∼ N(n, p). The estimator for proportion,which is just the sample mean of n Bernoulli variables, and its standard error are

p̂ =X

n; σp̂ =

√p(1− p)

By the central limit theorem, and n gets large then the average has approximately the normaldistribution p̂ ∼ N(p0, σp̂). Suppose we wished to find a level α two-sided confidence interval.Then assuing it is normal, with probability 1− α, the standardized variable falls in the range

−zα/2 <p̂− p√p(1−p)

< zα/2 (1)

The standard interval is found from this by making a further approximation, which lessensits accuracy. It is to approximate that the standard error σp̂ by its estimator

sp̂ =

√p̂(1− p̂)

and then solve the inequality (1) with sp̂ in place of σp̂ yielding the old interval

p̂± zα/2

√p̂(1− p̂)

n. (2)

The rule of thumb is that the old interval is a valid interval if np̂ ≥ 10 and n(1− p̂) ≥ 10. In thatcase, the probability that p was captured in this experiment is 1− α.

To find their new confidence interval, Argesti and Coull solved (1) for p directly withoutapproximating σp̂ to get their interval for p

p̂ +z2α/2

2n± zα/2

√p̂(1− p̂)

z2α/2

1 +z2α/2

Note that under the assumption that the the population proportion is p, we can computewhether or not inequalites (2) or (3) hold. After all, p̂ is a binomial random variable, and bothintervals boil down to the random variable being between two constants. This is called thecoverage probability. I propose to compute it as a function of n and p.

Let us consider the old interval first. The bounds (2) are equivalent to

|p̂− p| ≤ zα/2

√p̂(1− p̂)

or after squaring

(p̂− p)2 ≤ z2α/2

p̂(1− p̂)n

Combining (1 +

z2α/2

)p̂2 −

z2α/2

)p̂ + p2 < 0

This is satisfied if the random variable p̂ falls between the roots

p +z2α/2

2n± zα/2

√p(1− p)

z2α/2

1 +z2α/2

On the other hand, the random variable lies in the Agresti-Coull interval (3) if

|p̂− p| ≤ zα/2

√p(1− p)

or after squaring

(p̂− p)2 ≤ z2α/2

p(1− p)n

Combining

p̂2 − 2pp̂p2 −z2α/2

np(1− p) < 0

This is satisfied if the random variable p̂ falls between the roots

p± zα/2

√p(1− p)

n. (5)

The complete symmetry, (2) looks like (5), and (3) like (4) is no surprise, since in deriving theinequalities from (1) we have just reversed the roles of p and p̂.

One more comment is in order. If X ∼ B(n, p) is a binomial variable which is discrete, thenB(x, n, p) = P (X ≤ x), the cumulative distribution function, corresponds to equlity. In the casethat x is an integer then P (X < x) may be smaller than P (X ≤ x). R has two functions thatgive the CDF, namely

B(x, n, p) = P (X ≤ x) = pbinom(x, n, p) = 1− pbinom(x, n, p, lower.tail = FALSE)P (X < x) = pbinom(n− x, n, 1− p, lower.tail = FALSE)

P (a < X < b) = P (X < b)− P (X ≤ a)= pbinom(n− b, n, 1− p, lower.tail = FALSE)− pbinom(a, n, p).

Finally, if x(p) depends continuously on p, e.g., say if it were one of the interval limits (4) or(5), then B(x(p), n, p) may be a discontinuous function of p since x(p) may be crossing integervalues as p varies.

R Session:

R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type ’license()’ or ’licence()’ for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.Type ’contributors()’ for more information and’citation()’ on how to cite R or R packages in publications.

Type ’demo()’ for some demos, ’help()’ for on-line help, or’help.start()’ for an HTML browser interface to help.Type ’q()’ to quit R.

[R.app GUI 1.31 (5538) powerpc-apple-darwin8.11.1]

[Workspace restored from /Users/andrejstreibergs/.RData]

> ################### P(X <= x) VS P(X < x) ##########################>> # Compare P(X <= x) = pbinom(x,n,p) and> # P(X < x) = pbinom(n-x, n, 1-p, lower.tail=F)> # for real values of x as it passes through an integer value.> p<-10/11;n<-17;c<- 7;x<- (c*11-2):(c*11+2)/11> x[1] 6.818182 6.909091 7.000000 7.090909 7.181818> pbinom(x,n,p)[1] 2.575798e-08 2.575798e-08 4.105263e-07 4.105263e-07 4.105263e-07> pbinom(n-x,n,1-p,lower.tail=F)[1] 2.575798e-08 2.575798e-08 2.575798e-08 4.105263e-07 4.105263e-07> p<-10/19;n<-23;c<- 11;x<- (c*11-2):(c*11+2)/11> x[1] 10.81818 10.90909 11.00000 11.09091 11.18182> pbinom(x,n,p)[1] 0.2511192 0.2511192 0.3992274 0.3992274 0.3992274> pbinom(n-x,n,1-p,lower.tail=F)[1] 0.2511192 0.2511192 0.2511192 0.3992274 0.3992274>> ############### COVERAGE PROBABILITIES OF OLD INTERVALS #################> alpha <- 0.05> za<- qnorm(alpha,0,1,lower.tail=F);za[1] 1.644854> za2<- qnorm(alpha/2,0,1,lower.tail=F);za2[1] 1.959964>> # The upper / lower limits for X = n * p_hat are a(p) and b(p)

>> n <- 20> c1 <- n/(1+za2^2/n)> c2 <-n*za2^2/(2*(n+za2^2));c2[1] 1.611252> c3 <-n^2*za2/(n+za2^2);c3[1] 32.88329> c4 <- za2^2/(4*n^2);c4[1] 0.002400912> a <- function(p){c1*p + c2 - c3*sqrt(p*(1-p)/n+c4)}> b <- function(p){c1*p + c2 + c3*sqrt(p*(1-p)/n+c4)}> f<-function(p){pbinom(n-b(p),n,1-p, lower.tail = FALSE)-pbinom(a(p), n, p)}> curve(f(x),0,1,ylim=c(.8,1),main="Old CI Coverage Probability for n=20, alpha=.05")> abline(h=.95,col=2); abline(v=.5,col=3)> #M3074CoverageEg1.pdf>> n<-50> c1 <- n/(1+za2^2/n)> c2 <-n*za2^2/(2*(n+za2^2));c2[1] 1.78369> c3 <-n^2*za2/(n+za2^2);c3[1] 91.00626> c4 <- za2^2/(4*n^2);c4[1] 0.0003841459> a <- function(p){c1*p + c2 - c3*sqrt(p*(1-p)/n+c4)}> b <- function(p){c1*p + c2 + c3*sqrt(p*(1-p)/n+c4)}> f<-function(p){pbinom(n-b(p),n,1-p, lower.tail = FALSE)-pbinom(a(p), n, p)}> curve(f(x),0,1,ylim=c(.8,1),main="Old CI Coverage Probability for n=50, alpha=.05")> abline(h=.95,col=2);abline(v=c(.2,.8),col=3)> # M3074CoverageEg2.pdf>> n<-100> c1 <- n/(1+za2^2/n)> c2 <-n*za2^2/(2*(n+za2^2));c2[1] 1.849675> c3 <-n^2*za2/(n+za2^2);c3[1] 188.7458> c4 <- za2^2/(4*n^2);c4[1] 9.603647e-05> a <- function(p){c1*p + c2 - c3*sqrt(p*(1-p)/n+c4)}> b <- function(p){c1*p + c2 + c3*sqrt(p*(1-p)/n+c4)}> f<-function(p){pbinom(n-b(p),n,1-p, lower.tail = FALSE)-pbinom(a(p), n, p)}> curve(f(x),0,1,ylim=c(.8,1),main="Old CI Coverage Probability for n=100, alpha=.05")> abline(h=.95,col=2);abline(v=c(.1,.9),col=3)> # M3074CoverageEg3.pdf

> n<-200> c1 <- n/(1+za2^2/n)> c2 <-n*za2^2/(2*(n+za2^2));c2[1] 1.884533> c3 <-n^2*za2/(n+za2^2);c3[1] 384.6056> c4 <- za2^2/(4*n^2);c4[1] 2.400912e-05> a <- function(p){c1*p + c2 - c3*sqrt(p*(1-p)/n+c4)}> b <- function(p){c1*p + c2 + c3*sqrt(p*(1-p)/n+c4)}> f<-function(p){pbinom(n-b(p),n,1-p, lower.tail = FALSE)-pbinom(a(p), n, p)}> curve(f(x),0,1,ylim=c(.8,1),main="Old CI Coverage Probability for n=200, alpha=.05")> abline(h=.95,col=2);abline(v=c(.05,.95),col=3)> # M3074CoverageEg3.pdf>> n<-400> c1 <- n/(1+za2^2/n)> c2 <-n*za2^2/(2*(n+za2^2));c2[1] 1.902459> c3 <-n^2*za2/(n+za2^2);c3[1] 776.5281> c4 <- za2^2/(4*n^2);c4[1] 6.00228e-06> a <- function(p){c1*p + c2 - c3*sqrt(p*(1-p)/n+c4)}> b <- function(p){c1*p + c2 + c3*sqrt(p*(1-p)/n+c4)}> f<-function(p){pbinom(n-b(p),n,1-p, lower.tail = FALSE)-pbinom(a(p), n, p)}> curve(f(x),0,1,ylim=c(.8,1),main="Old CI Coverage Probability for n=400, alpha=.05")> abline(h=.95,col=2);abline(v=c(1/40,39/40),col=3)> M3074CoverageEg5.pdfError: object ’M3074CoverageEg5.pdf’ not found> # M3074CoverageEg5.pdf>

0.0 0.2 0.4 0.6 0.8 1.0

Old CI Coverage Probability for n=20, alpha=.05

0.0 0.2 0.4 0.6 0.8 1.0

> ############### COVERAGE PROBABILITIES OF NEW INTERVALS #################> n <- 20> an <- function(p){n*p - za2*sqrt(n*p*(1-p))}> bn <- function(p){n*p + za2*sqrt(n*p*(1-p))}> g<-function(p){pbinom(n-b(p),n,1-p, lower.tail = FALSE)-pbinom(a(p), n, p)}> curve(g(x),0,1,ylim=c(.8,1),main="New CI Coverage Probability for n=20, alpha=.05")> abline(h=.95,col=4);abline(v=c(10/n,(n-10)/n),col=5)> # M3074Coverage8.pdf>> n <- 50> an <- function(p){n*p - za2*sqrt(n*p*(1-p))}> bn <- function(p){n*p + za2*sqrt(n*p*(1-p))}> g<-function(p){pbinom(n-bn(p),n,1-p, lower.tail = FALSE)-pbinom(an(p), n, p)}> curve(g(x), 0, 1, ylim=c(.8,1),+ main="New CI Coverage Probability for n=50 alpha=.05",+ xlab="p", ylab="Probability p-hat Falls Within CI")> abline(h=.95,col=4);abline(v=c(10/n,(n-10)/n),col=5)>> n <- 100> an <- function(p){n*p - za2*sqrt(n*p*(1-p))}> bn <- function(p){n*p + za2*sqrt(n*p*(1-p))}> g<-function(p){pbinom(n-b(p),n,1-p, lower.tail = FALSE)-pbinom(a(p), n, p)}> curve(g(x),0,1,ylim=c(.8,1),+ main="New CI Coverage Probability for n=100 alpha=.05")> abline(h=.95,col=4);abline(v=c(10/n,(n-10)/n),col=5)> # M3074Coverage10.pdf>>> n <- 200> an <- function(p){n*p - za2*sqrt(n*p*(1-p))}> bn <- function(p){n*p + za2*sqrt(n*p*(1-p))}> g<-function(p){pbinom(n-b(p),n,1-p, lower.tail = FALSE)-pbinom(a(p), n, p)}> curve(g(x),0,1,ylim=c(.8,1),+ main="New CI Coverage Probability for n=200 alpha=.05")> abline(h=.95,col=4);abline(v=c(10/n,(n-10)/n),col=5)> # M3074Coverage11.pdf>> n <- 400> an <- function(p){n*p - za2*sqrt(n*p*(1-p))}> bn <- function(p){n*p + za2*sqrt(n*p*(1-p))}> g<-function(p){pbinom(n-b(p),n,1-p, lower.tail = FALSE)-pbinom(a(p), n, p)}> curve(g(x), 0, 1, ylim=c(.8,1),+ main="New CI Coverage Probability for n=200 alpha=05",+ xlab="p", ylab="Probability p-hat Falls Within CI")> abline(h=.95,col=4);abline(v=c(10/n,(n-10)/n),col=5)> # M3074Coverage12.pdf

> n <- 10> an <- function(p){n*p - za2*sqrt(n*p*(1-p))}> bn <- function(p){n*p + za2*sqrt(n*p*(1-p))}> g<-function(p){pbinom(n-b(p),n,1-p, lower.tail = FALSE)-pbinom(a(p), n, p)}> curve(g(x), 0, 1, ylim=c(.8,1),+ main="New CI Coverage Probability for n=10 alpha=.05",+ xlab="p", ylab="Probability p-hat Falls Within CI")> abline(h=.95,col=4)> # M3074Coverage7.pdf>>> n <- 5> an <- function(p){n*p - za2*sqrt(n*p*(1-p))}> bn <- function(p){n*p + za2*sqrt(n*p*(1-p))}> g<-function(p){pbinom(n-b(p),n,1-p, lower.tail = FALSE)-pbinom(a(p), n, p)}> curve(g(x), 0, 1, ylim=c(.8,1),+ main="New CI Coverage Probability for n=5 alpha=.05",+ xlab="p", ylab="Probability p-hat Falls Within CI")> abline(h=.95,col=4)> # M3074Coverage6.pdf

0.0 0.2 0.4 0.6 0.8 1.0

New CI Coverage Probability for n=5 alpha=.05

0.0 0.2 0.4 0.6 0.8 1.0

New CI Coverage Probability for n=20, alpha=.05

0.0 0.2 0.4 0.6 0.8 1.0

> ####################### PLOT OF COVERAGE INTERVALS #####################> n <- 20> an <- function(p){n*p - za2*sqrt(n*p*(1-p))}> bn <- function(p){n*p + za2*sqrt(n*p*(1-p))}> c1 <- n/(1+za2^2/n)> c2 <-n*za2^2/(2*(n+za2^2));c2[1] 1.611252> c3 <-n^2*za2/(n+za2^2);c3[1] 32.88329> c4 <- za2^2/(4*n^2);c4[1] 0.002400912> a <- function(p){c1*p + c2 - c3*sqrt(p*(1-p)/n+c4)}> b <- function(p){c1*p + c2 + c3*sqrt(p*(1-p)/n+c4)}> curve(an(x), 0, 1, ylim=c(0,n),+ main="n=20 Coverage Intervals for Old and New CI’s", xlab="p",+ ylab="Binomial Variable X ~ Bin(n,p)",col=2)> curve(bn(x),0,1,add=T,col=2)> curve(a(x),0,1,add=T,col=4)> curve(b(x),0,1,add=T,col=4)> abline(h=c(0,20),col="gray")> abline(v=c(0,1),col="gray")> legend(0.03,18,legend=c("Old Coverage Interval","New Coverage Interval"),+ fill=c(4,2))> # M3074Coverage13.pdf

0.0 0.2 0.4 0.6 0.8 1.0

20n=20 Coverage Intervals for Old and New CI's

Old Coverage IntervalNew Coverage Interval

Coverage Intervals: Actual Probability Name: Example Treibergs ... -...

Documents

Transcript of Coverage Intervals: Actual Probability Name: Example Treibergs ... -...

Econometrics - INVEMAR€¦ · Function Is Wrong? 287 Prediction and Prediction Intervals with Heteroskedasticity 289 8.5 The Linear Probability Model Revisited 290 Surnmary 293 Key

Student Handbook - Eastern International College€¦ · calculations of descriptive statistics, probability distributions, confidence intervals, hypothesis testing, chi square, regression,

Introduction to Data Analysis Probability Confidence Intervals.

EUGM 2014 | BEKELE | Adaptive Dose Finding Using Toxicity Probability Intervals

BAYESIAN TOLERANCE INTERVALS WITH PROBABILITY …

Session 1 Probability to confidence intervals By : Allan Chang.

By Prof. Lydia Ayers. Types of Intervals augmented intervals + 1/2 stepaugmented intervals + 1/2 step diminished intervals - 1/2 stepdiminished intervals.

What can hierarchical Bayes do for you? What can you do ... · •Probability statements: Bayesian vs. frequentist –Bayesian credible intervals: “conditional on . all . data,

UICSM research report - University Librarylibsysdigi.library.uiuc.edu/OCA/Books2011-08/uicsm...intervals,P(A^)orP(A^+1)areindependentofP{A2)orPiA^+1)and the probability ofa givendifference

Math 3070 x 1. Probability Plots for Normal, Example Treibergs …treiberg/M3073Geyser.pdf · 2010-10-08 · Math 3070 x 1. Treibergs Probability Plots for Normal, Exponential and

138 Intervals

Inequalities of Analysis - University of Utahtreiberg/InequalitiesSlides.pdf · Inequalities of Analysis Andrejs Treibergs University of Utah Fall 2014. 2. USAC Lecture: Inequalities

Research Article Confidence Intervals for the Coefficient ...downloads.hindawi.com/journals/jps/2013/324940.pdf · Journal of Probability and Statistics Weight Frequency 2500 3500

The Hartman-Grobman Theorem - The University of Utahtreiberg/M6414HartmanGrobman.pdf · Math 6410 - 1, Ordinary Diﬀerential Equations The Hartman-Grobman Theorem Andrejs Treibergs

Session 1 Probability to confidence intervals

Automatically Veriﬁed Reasoning with Both …...Automatically Veriﬁed Reasoning with Both Intervals and Probability Density... 51 on the real number line and by a probability mass.

Statistics Sampling Intervals for a Single Sample Contents, figures, and exercises come from the textbook: Applied Statistics and Probability for Engineers,

Biostat 200 Lecture 8 1. Where are we Types of variables Descriptive statistics and graphs Probability Confidence intervals for means and proportions.

Probability and Confidence Intervals (PDF, 1052 KB)

Jeopardy Statistics Edition. Terms General Probability Sampling Distributions Confidence Intervals Hypothesis Tests: Proportions Hypothesis Tests: Means.