Introduction to Inferential Statistics · 2019. 12. 30. · Introduction to Inferential Statistics...
Transcript of Introduction to Inferential Statistics · 2019. 12. 30. · Introduction to Inferential Statistics...
In collaboration with Clinical Translational Science Center (CTSC) and
the Biostatistics and Bioinformatics Shared Resource (BB-SR), Stony Brook Cancer Center (SBCC).
Introduction to Inferential
Statistics
Jie Yang, Ph.D.
Associate Professor
Department of Family, Population and Preventive Medicine
Director
Biostatistical Consulting Core
Confidence Interval
- Why and How?
Hypothesis Testing
- What and How?
OUTLINE
GOAL OF STATISTICS
Sampling
Inference
Probability
TheoryPOPULATION SAMPLE
Descriptive
Statistics
Descriptive
Statistics
Inferential StatisticsSample
Statistics: 𝑿 , 𝒔, 𝒑 ,…
Population
Parameters:
𝝁, 𝝈, 𝝅…
NORMAL DISTRIBUTION
Carl Friedrich
Gauss (1777
– 1855)
),(~ 2NX
XZ
)1,0(~ NZ
• If , let ,
then
• Central limit theorem: ),(~ 2
xxNx
POINT ESTIMATE
Sample mean: 𝑿Sample sd: 𝒔Sample %: 𝒑
……
A point estimate of a population characteristic is a single number that is based on sample data and represents a plausible value of the characteristic.
A point estimate is obtained by first selecting an appropriate statistic. The estimate is then the value of the statistic for the given sample.
Population Parameters Sample Statistics
Popu mean: 𝝁Popu sd: 𝝈Popu %: 𝝅……
CHOOSING A POINT ESTIMATOR
There are more than one statistic may be reasonable to use to obtain a point estimate of a specified population characteristic.
A statistic with mean value equal to the value of the population characteristic being estimated is said to be an unbiased statistic. A statistic that is not unbiased is said to be biased.
Sampling distribution of
a unbiased statistic
Sampling distribution of
a biased statistic
Original
distribution
Given a choice between several unbiased statistics
that could be used for estimating a population
characteristic, the best statistic to use is the one with
the smallest variation.
Unbiased sampling
distribution with the smallest
variation, the Best choice.
CHOOSING A POINT ESTIMATOR
For example: When population distribution is symmetric, 𝑥 is not the only choice of statistic to estimate the population mean μ.
– If the population distribution is normal, then 𝑥 has a smaller variance than any other unbiased statistic for estimating μ. However, a trimmed mean with a small trimming percentage performs almost as well as 𝑥.
– When the population distribution is symmetric with heavy tails compared to the normal curve, a trimmed mean is a better statistic than 𝑥 for estimating μ.
POINT ESTIMATOR EXAMPLE
A MOTIVATING EXAMPLE
Two researchers are working independently to estimate the mean lung capacity of former smokers.
o Researcher A randomly samples 100 former smokers and calculates a sample mean of 1.76 liters. A knows that the sample mean is an unbiased estimator of the population mean, so he reports as his point estimate for
o Researcher B randomly samples 36 former smokers and calculates a sample mean of 1.85 liters. B also knows the sample mean is the best estimator of the population mean, so he reports
ˆ 1.76
ˆ 1.85.
.
Whose estimate is better?
Research A’s estimateWhy?
Assume the population standard deviation is known to be .27. (In general, we don’t know the true standard deviation.)
If
•By the empirical rule, 95 percent of the possible sample means that A could have observed lie within two SDs of the true mean; that is, within 2(.027)=.054 of •Hence, we are 95 percent confident that A’s estimate, 1.76, is within .054 of the true mean.•In contrast, we are 95 percent confident that B’s estimate is within 2(.045)=.090 of the true mean, which is much less precise.
We conclude that A’s estimate is more likely closer to but note, that does not mean we are sure 1.76 is closer to the truth than 1.85. It is possible that (i.e., B was lucky and got a “good” sample).
. .
. .27, .027 .045.10 6
then = and = pop pop
pop A B
.
,
1.85
A MOTIVATING EXAMPLE
A point estimate alone is not enough: it gives us no way to judge how precise it is as an estimator.
CONFIDENCE INTERVAL
A confidence interval provides a better estimate by combining the point estimate with its standard error to define a range of values that are likely to cover the true value of the parameter.
A confidence intervals starts with the point estimate and adds a “margin of error.” A confidence interval is defined as: point estimate +/- margin of error.
The margin of error depends on our desired level of confidence. We choose how “confident” we want to be in our estimate and we
construct an interval to reflect our chosen level of confidence. In general, the higher our desired level of confidence,
the wider (less precise) our interval will be.
95% CI for μ: P(-??<µ<??)=0.95
CI FOR POPULATION MEAN
Based on central limit theorem,
),(~ 2
xxNx
95.0)96.1/
96.1(
n
xP
95.0)*96.1*96.1( n
xn
xP
95% Confidence Interval (CI) for population mean µ:
CONFIDENCE INTERVAL
nx
*96.1
Such CI is random until we get a sample (mean) .Then the CI either covers μ or not, and we don't know which! After we compute the observed CI, we talk about “confidence" not “probability“.If we did a meta-experiment and collected samples of size n repeatedly and formed 95% CI's, approximately 95 in 100 would cover µ.Increasing n only makes the intervals smaller; still 95% of the CI's would cover µ .
A good link for simulation of CI: http://wise.cgu.edu/portfolio/demo-confidence-interval-creation/
INTERPRETATION OF CI
Specified confidence level is called the coverage probability.
This is a property of long sequence of confidence intervals computed from valid models, rather a property of any single confidence interval.
- 95% refers only to how often 95% confidence intervals computed from
many studies would contain the true effect size if all the assumptions used to compute the intervals were correct
NOTES ABOUT CI
A confidence interval is always symmetric by point estimate.
Exceptions:
1. When data transformation is used to meet some model assumptions, the resultant back-transformed CI is not symmetric by point estimate (e.g., CI of an estimated odds ratio from a logistic regression model).
2. Theoretical distribution of point estimate is not symmetric or adjustment is made to maintain coverage probability (e.g., modified Wilson score method to get CI of two proportions’ difference)
MISINTERPRETATION OF CI
An effect size outside the 95% CI has been excluded by the data.
Note: A CI is computed from many assumptions.
An observed 95% CI predicts that 95% of the estimates from future studies will fall inside the observed interval.
Note: 95% is the frequency with which other unobserved intervals will contain the true effect, not how frequently the one interval being presented will contain future estimates. In fact, it should be lower than 95%. For example, the chance that one 95% CI from study 1 about μ contains the point estimate from study 2 is 83% under the ideal conditions.
If one 95% CI includes the null value and another excludes that value, the interval excluding the null is the more precise one.
Note: When model is correct, precision of a CI is measured by its width.
The cartoon guide to Statistics by Gonick
and Smith
• Using samples to test specific hypotheses
• Making decisions based on probability
(instead of subjective impressions)
• Distribution is usually assumed
• Methods that require no distributional
assumptions are called non-parametric or
distribution free
HYPOTHESIS TESTING
• A well formulated hypothesis will be both quantifiable and testable, that is, involve measurable quantities or refer to items that may be assigned to mutually exclusive categories.
• Takes one of two forms:
“ Some measurable characteristic of a population takes one of a
specific set of values”
“ Some measurable characteristic takes different values in different populations, the difference has a specific pattern or a
specific set of values”
WHAT IS A HYPOTHESIS
The Null hypothesis describes some aspect of the statistical behavior of a set of data and is denoted H0.
This description is treated as valid unless the actual behavior of the data contradicts this assumption.
The Alternative Hypothesis is generally the “opposite” of the null hypothesis and is denoted HA or H1 .
BASIC DEFINITIONS AND
NOTATION
AN EXAMPLE
Scientists wish to test the hypothesis that norepinephrine (NE) levels are different in rats exposed to toluene (glue) and those that aren't.The scientist designs a controlled experiment in which n1 = 6 rats are exposed to toluene and n2 = 5 rats are not. NE levels are measured in the rats' brains.The scientists wish to show that the population NE levels are different among rats exposed and non-exposed to toluene.This is encapsulated in the mathematical statement
the mean NE levels differ across exposed and non-exposed.
HYPOTHESIS TESTING
A hypothesis test is a proof by contradiction.We assume the null is true, then the data shows us something that is absurd, casting doubt on what we assumed, namely .So we have to conclude the opposite, .The null hypothesis is what we are trying to disprove:
The alternative hypothesis is what we're trying to show istrue:
Note: If the alternative hypothesis is not proved, it doesn’t
mean that the null hypothesis is true.
HYPOTHESIS TESTING
Assuming data are normal in both populations then
has a t distribution with df given by the Satterthwaite-Welch formula. In the hypothesis test, we are assuming , , so
has a t distribution, which is centered at zero. If is really far away from zero in either direction, we haveevidence that is not true. measures how far apart and are, i.e. how many 's apart.
(a) Data compatible with H0 (so no evidence toward HA), (b) data not compatible with H0 (in favor of HA).
HYPOTHESIS TESTING
NE LEVEL EXAMPLE
is 2.34 SE's away from .
How big is big?
Data from Statistics for the Life Sciences, Fifth Edition by Samuels, M.L., Witmer, J.A., and Schaffner, A. Addison Wesley, 2016
P-VALUE
The P-value for a hypothesis test is the probability of the test statistic being at least as extreme as the observed test statistic, assuming H0 is true.
P-value answers the question “how big is big?" for .
P-VALUE FOR TWO-SAMPLE PROBLEM
For the two-sample problem, P-value is the probability of seeing sample means Y1 and Y2 even further apart than what we saw, if is true.This is a standard probability calculation due to W.S. Gosset
where is a t random variable with df given by theSatterthwaite-Welch formula.
HOW TO USE P-VALUE
is “big" then the P-value is “small." But how small is small?We compare the P-value to a cutoff value denoted as α.
α is called the significance level of the hypothesis test; it is often set as 0.05.
If P-value < α then we reject H0 at a significance level of α.If P-value > α then we fail to reject H0 at a significance level of α.
NE LEVEL EXAMPLE
Data from Statistics for the Life Sciences, Fifth Edition by Samuels, M.L., Witmer, J.A., and Schaffner, A. Addison Wesley, 2016
P-value = 0.0454 < 0.05. Reject at the 5% significance
level. Therefore, there is statistically significant evidence that the mean NE levels are different in toluene-exposed vs. non-exposed rats.
INTERPRETATION OF
We reject when P-value < α.
When the null is true, we wrongly reject with probability α .
α is called the Type I error rate
Wrongly rejecting the null is a Type I error.
TYPE II ERROR RATE
When the alternative is true, we wrongly fail to reject with probability β. β depends on many parameters (e.g. µ1, µ2, σ1, σ2, n1, and n2 in comparing two-samples). We never actually know β but we can guess it. β is called the Type II error
Wrongly failing to reject the null is a Type II error. The power of the test is
ERRORS IN HYPOTHESIS TESTING
True situation Decisions
Lack of significant evidence for HA
(e.g. No Difference in drug)
Significant evidence for HA
(e.g. Drug is Better)
H0 True(e.g. No Difference in
drug)
Correct Type I error: (e.g. Manufacturer wastes money
developing an ineffective drugs)
HA True(e.g. Drug is
better)
Type II error: (e.g. Manufacturer misses
opportunity for profit; Public denied access to effective
treatment)
Correct
There is a trade off between type I error, α, and type II error, β
TYPE I ERROR AND POWER
𝐻0: 𝜇 = 4 𝑣𝑠 𝐻1: 𝜇 ≠ 4, 𝑎𝑠𝑠𝑢𝑚𝑖𝑛𝑔 𝜇 = 7
1. State null (H0) and alternative (H1) hypotheses
2. Choose a significance level, α (usually 0.05)
3. Based on the sample, calculate the test statistic
and calculate p-value based on a theoretical
distribution of the test statistic
4. Compare p-value with the significance level α
5. Make a decision, and state the conclusion
BASIC STEPS IN HYPOTHESIS TESTING
If we wish to conduct a test of a hypothesis regarding a population parameter with significance level α, we can do this by constructing a 100(1-α)% confidence interval and checking to see if the hypothesized value is in the interval.
In this manner, a CI can be used to conduct a Hypothesis Test or vice versa.
RELATIONSHIP BETWEEN HT
AND CI
Variable Group Estimate95% Confidence
IntervalP-value
absolute
difference
Control vs. No surgery 0.421 0.239-0.741 0.0036
Control vs. Surgery 0.276 0.158-0.482 <0.0001
No surgery vs. Surgery 0.655 -0.372-1.156 0.1400
Gamma-knife vs. Resection 0.420 0.190-0.925 0.0322
COMMON MISINTERPRETATION
If two 95% confidence intervals overlap, the difference between two estimates or studies is not significant at α=0.05.
If the two 95 % CIs don’t overlap, then when using the same assumptions used to compute the CI, P<0.05 for the difference.
If one of the 95 % CIs contains the point estimate from the other group or study, then P> 0.05 for the difference.
SUMMARY
Confidence intervals can be used both to evaluate and report on the precision of estimates and the significance of hypothesis tests.
The center of interval is no more likely than any other value. Recommend to always report CI:
Cummings P, Rivara FP. Reporting Statistical Information in Medical Journal Articles. Arch Pediatr Adolesc
Med. 2003;157(4):321–324. doi:10.1001/archpedi.157.4.321
SUMMARY
We have a null hypothesis and the alternative .The P-value gives evidence against .We reject if P-value <α , where α is the significance levelof the test, usually α = 0.05. α is the probability of making a Type I error.α = Pr{reject | true}.β is the probability of making a Type II error. Can be “guessed”. β = Pr{fail to reject | true}.The power of a test is .. This depends on many factors.
Please check BCC’s website for future lectures
https://osa.stonybrookmedicine.edu/research-core-facilities/bcc/education
Coming ones:
March 20, p-value & FDR
April 4, performing basic statistical tests using different software
THANK YOU!