Introduction to Inferential Statistics · 2019. 12. 30. · Introduction to Inferential Statistics...

In collaboration with Clinical Translational Science Center (CTSC) and

the Biostatistics and Bioinformatics Shared Resource (BB-SR), Stony Brook Cancer Center (SBCC).

Introduction to Inferential

Statistics

Jie Yang, Ph.D.

Associate Professor

Department of Family, Population and Preventive Medicine

Director

Biostatistical Consulting Core

Confidence Interval

- Why and How?

Hypothesis Testing

- What and How?

OUTLINE

GOAL OF STATISTICS

Sampling

Inference

Probability

TheoryPOPULATION SAMPLE

Descriptive

Statistics

Descriptive

Statistics

Inferential StatisticsSample

Statistics: 𝑿 , 𝒔, 𝒑 ,…

Population

Parameters:

𝝁, 𝝈, 𝝅…

NORMAL DISTRIBUTION

Carl Friedrich

Gauss (1777

– 1855)

),(~ 2NX

XZ

)1,0(~ NZ

• If , let ,

then

• Central limit theorem: ),(~ 2

xxNx

POINT ESTIMATE

Sample mean: 𝑿Sample sd: 𝒔Sample %: 𝒑

……

A point estimate of a population characteristic is a single number that is based on sample data and represents a plausible value of the characteristic.

A point estimate is obtained by first selecting an appropriate statistic. The estimate is then the value of the statistic for the given sample.

Population Parameters Sample Statistics

Popu mean: 𝝁Popu sd: 𝝈Popu %: 𝝅……

CHOOSING A POINT ESTIMATOR

There are more than one statistic may be reasonable to use to obtain a point estimate of a specified population characteristic.

A statistic with mean value equal to the value of the population characteristic being estimated is said to be an unbiased statistic. A statistic that is not unbiased is said to be biased.

Sampling distribution of

a unbiased statistic

Sampling distribution of

a biased statistic

Original

distribution

Given a choice between several unbiased statistics

that could be used for estimating a population

characteristic, the best statistic to use is the one with

the smallest variation.

Unbiased sampling

distribution with the smallest

variation, the Best choice.

CHOOSING A POINT ESTIMATOR

For example: When population distribution is symmetric, 𝑥 is not the only choice of statistic to estimate the population mean μ.

– If the population distribution is normal, then 𝑥 has a smaller variance than any other unbiased statistic for estimating μ. However, a trimmed mean with a small trimming percentage performs almost as well as 𝑥.

– When the population distribution is symmetric with heavy tails compared to the normal curve, a trimmed mean is a better statistic than 𝑥 for estimating μ.

POINT ESTIMATOR EXAMPLE

A MOTIVATING EXAMPLE

Two researchers are working independently to estimate the mean lung capacity of former smokers.

o Researcher A randomly samples 100 former smokers and calculates a sample mean of 1.76 liters. A knows that the sample mean is an unbiased estimator of the population mean, so he reports as his point estimate for

o Researcher B randomly samples 36 former smokers and calculates a sample mean of 1.85 liters. B also knows the sample mean is the best estimator of the population mean, so he reports

ˆ 1.76

ˆ 1.85.

.

Whose estimate is better?

Research A’s estimateWhy?

Assume the population standard deviation is known to be .27. (In general, we don’t know the true standard deviation.)

If

•By the empirical rule, 95 percent of the possible sample means that A could have observed lie within two SDs of the true mean; that is, within 2(.027)=.054 of •Hence, we are 95 percent confident that A’s estimate, 1.76, is within .054 of the true mean.•In contrast, we are 95 percent confident that B’s estimate is within 2(.045)=.090 of the true mean, which is much less precise.

We conclude that A’s estimate is more likely closer to but note, that does not mean we are sure 1.76 is closer to the truth than 1.85. It is possible that (i.e., B was lucky and got a “good” sample).

. .

. .27, .027 .045.10 6

then = and = pop pop

pop A B

.

,

1.85

A MOTIVATING EXAMPLE

A point estimate alone is not enough: it gives us no way to judge how precise it is as an estimator.

CONFIDENCE INTERVAL

A confidence interval provides a better estimate by combining the point estimate with its standard error to define a range of values that are likely to cover the true value of the parameter.

A confidence intervals starts with the point estimate and adds a “margin of error.” A confidence interval is defined as: point estimate +/- margin of error.

The margin of error depends on our desired level of confidence. We choose how “confident” we want to be in our estimate and we

construct an interval to reflect our chosen level of confidence. In general, the higher our desired level of confidence,

the wider (less precise) our interval will be.

95% CI for μ: P(-??<µ<??)=0.95

CI FOR POPULATION MEAN

Based on central limit theorem,

),(~ 2

xxNx

95.0)96.1/

96.1(

n

xP

95.0)*96.1*96.1( n

xn

xP

95% Confidence Interval (CI) for population mean µ:

CONFIDENCE INTERVAL

nx

*96.1

Such CI is random until we get a sample (mean) .Then the CI either covers μ or not, and we don't know which! After we compute the observed CI, we talk about “confidence" not “probability“.If we did a meta-experiment and collected samples of size n repeatedly and formed 95% CI's, approximately 95 in 100 would cover µ.Increasing n only makes the intervals smaller; still 95% of the CI's would cover µ .

A good link for simulation of CI: http://wise.cgu.edu/portfolio/demo-confidence-interval-creation/

http://wise.cgu.edu/portfolio/demo-confidence-interval-creation/

INTERPRETATION OF CI

Specified confidence level is called the coverage probability.

This is a property of long sequence of confidence intervals computed from valid models, rather a property of any single confidence interval.

- 95% refers only to how often 95% confidence intervals computed from

many studies would contain the true effect size if all the assumptions used to compute the intervals were correct

NOTES ABOUT CI

A confidence interval is always symmetric by point estimate.

Exceptions:

1. When data transformation is used to meet some model assumptions, the resultant back-transformed CI is not symmetric by point estimate (e.g., CI of an estimated odds ratio from a logistic regression model).

2. Theoretical distribution of point estimate is not symmetric or adjustment is made to maintain coverage probability (e.g., modified Wilson score method to get CI of two proportions’ difference)

MISINTERPRETATION OF CI

An effect size outside the 95% CI has been excluded by the data.

Note: A CI is computed from many assumptions.

An observed 95% CI predicts that 95% of the estimates from future studies will fall inside the observed interval.

Note: 95% is the frequency with which other unobserved intervals will contain the true effect, not how frequently the one interval being presented will contain future estimates. In fact, it should be lower than 95%. For example, the chance that one 95% CI from study 1 about μ contains the point estimate from study 2 is 83% under the ideal conditions.

If one 95% CI includes the null value and another excludes that value, the interval excluding the null is the more precise one.

Note: When model is correct, precision of a CI is measured by its width.

The cartoon guide to Statistics by Gonick

and Smith

• Using samples to test specific hypotheses

• Making decisions based on probability

(instead of subjective impressions)

• Distribution is usually assumed

• Methods that require no distributional

assumptions are called non-parametric or

distribution free

HYPOTHESIS TESTING

• A well formulated hypothesis will be both quantifiable and testable, that is, involve measurable quantities or refer to items that may be assigned to mutually exclusive categories.

• Takes one of two forms:

“ Some measurable characteristic of a population takes one of a

specific set of values”

“ Some measurable characteristic takes different values in different populations, the difference has a specific pattern or a

specific set of values”

WHAT IS A HYPOTHESIS

The Null hypothesis describes some aspect of the statistical behavior of a set of data and is denoted H0.

This description is treated as valid unless the actual behavior of the data contradicts this assumption.

The Alternative Hypothesis is generally the “opposite” of the null hypothesis and is denoted HA or H1 .

BASIC DEFINITIONS AND

NOTATION

AN EXAMPLE

Scientists wish to test the hypothesis that norepinephrine (NE) levels are different in rats exposed to toluene (glue) and those that aren't.The scientist designs a controlled experiment in which n1 = 6 rats are exposed to toluene and n2 = 5 rats are not. NE levels are measured in the rats' brains.The scientists wish to show that the population NE levels are different among rats exposed and non-exposed to toluene.This is encapsulated in the mathematical statement

the mean NE levels differ across exposed and non-exposed.

HYPOTHESIS TESTING

A hypothesis test is a proof by contradiction.We assume the null is true, then the data shows us something that is absurd, casting doubt on what we assumed, namely .So we have to conclude the opposite, .The null hypothesis is what we are trying to disprove:

The alternative hypothesis is what we're trying to show istrue:

Note: If the alternative hypothesis is not proved, it doesn’t

mean that the null hypothesis is true.

HYPOTHESIS TESTING

Assuming data are normal in both populations then

has a t distribution with df given by the Satterthwaite-Welch formula. In the hypothesis test, we are assuming , , so

has a t distribution, which is centered at zero. If is really far away from zero in either direction, we haveevidence that is not true. measures how far apart and are, i.e. how many 's apart.

(a) Data compatible with H0 (so no evidence toward HA), (b) data not compatible with H0 (in favor of HA).

HYPOTHESIS TESTING

NE LEVEL EXAMPLE

is 2.34 SE's away from .

How big is big?

Data from Statistics for the Life Sciences, Fifth Edition by Samuels, M.L., Witmer, J.A., and Schaffner, A. Addison Wesley, 2016

P-VALUE

The P-value for a hypothesis test is the probability of the test statistic being at least as extreme as the observed test statistic, assuming H0 is true.

P-value answers the question “how big is big?" for .

P-VALUE FOR TWO-SAMPLE PROBLEM

For the two-sample problem, P-value is the probability of seeing sample means Y1 and Y2 even further apart than what we saw, if is true.This is a standard probability calculation due to W.S. Gosset

where is a t random variable with df given by theSatterthwaite-Welch formula.

HOW TO USE P-VALUE

is “big" then the P-value is “small." But how small is small?We compare the P-value to a cutoff value denoted as α.

α is called the significance level of the hypothesis test; it is often set as 0.05.

If P-value < α then we reject H0 at a significance level of α.If P-value > α then we fail to reject H0 at a significance level of α.

NE LEVEL EXAMPLE

Data from Statistics for the Life Sciences, Fifth Edition by Samuels, M.L., Witmer, J.A., and Schaffner, A. Addison Wesley, 2016

P-value = 0.0454 < 0.05. Reject at the 5% significance

level. Therefore, there is statistically significant evidence that the mean NE levels are different in toluene-exposed vs. non-exposed rats.

INTERPRETATION OF

We reject when P-value < α.

When the null is true, we wrongly reject with probability α .

α is called the Type I error rate

Wrongly rejecting the null is a Type I error.

TYPE II ERROR RATE

When the alternative is true, we wrongly fail to reject with probability β. β depends on many parameters (e.g. µ1, µ2, σ1, σ2, n1, and n2 in comparing two-samples). We never actually know β but we can guess it. β is called the Type II error

Wrongly failing to reject the null is a Type II error. The power of the test is

ERRORS IN HYPOTHESIS TESTING

True situation Decisions

Lack of significant evidence for HA

(e.g. No Difference in drug)

Significant evidence for HA

(e.g. Drug is Better)

H0 True(e.g. No Difference in

drug)

Correct Type I error: (e.g. Manufacturer wastes money

developing an ineffective drugs)

HA True(e.g. Drug is

better)

Type II error: (e.g. Manufacturer misses

opportunity for profit; Public denied access to effective

treatment)

Correct

There is a trade off between type I error, α, and type II error, β

TYPE I ERROR AND POWER

𝐻0: 𝜇 = 4 𝑣𝑠 𝐻1: 𝜇 ≠ 4, 𝑎𝑠𝑠𝑢𝑚𝑖𝑛𝑔 𝜇 = 7

1. State null (H0) and alternative (H1) hypotheses

2. Choose a significance level, α (usually 0.05)

3. Based on the sample, calculate the test statistic

and calculate p-value based on a theoretical

distribution of the test statistic

4. Compare p-value with the significance level α

5. Make a decision, and state the conclusion

BASIC STEPS IN HYPOTHESIS TESTING

If we wish to conduct a test of a hypothesis regarding a population parameter with significance level α, we can do this by constructing a 100(1-α)% confidence interval and checking to see if the hypothesized value is in the interval.

In this manner, a CI can be used to conduct a Hypothesis Test or vice versa.

RELATIONSHIP BETWEEN HT

AND CI

Variable Group Estimate95% Confidence

IntervalP-value

absolute

difference

Control vs. No surgery 0.421 0.239-0.741 0.0036

Control vs. Surgery 0.276 0.158-0.482 <0.0001

No surgery vs. Surgery 0.655 -0.372-1.156 0.1400

Gamma-knife vs. Resection 0.420 0.190-0.925 0.0322

COMMON MISINTERPRETATION

If two 95% confidence intervals overlap, the difference between two estimates or studies is not significant at α=0.05.

If the two 95 % CIs don’t overlap, then when using the same assumptions used to compute the CI, P<0.05 for the difference.

If one of the 95 % CIs contains the point estimate from the other group or study, then P> 0.05 for the difference.

SUMMARY

Confidence intervals can be used both to evaluate and report on the precision of estimates and the significance of hypothesis tests.

The center of interval is no more likely than any other value. Recommend to always report CI:

Cummings P, Rivara FP. Reporting Statistical Information in Medical Journal Articles. Arch Pediatr Adolesc

Med. 2003;157(4):321–324. doi:10.1001/archpedi.157.4.321

SUMMARY

We have a null hypothesis and the alternative .The P-value gives evidence against .We reject if P-value <α , where α is the significance levelof the test, usually α = 0.05. α is the probability of making a Type I error.α = Pr{reject | true}.β is the probability of making a Type II error. Can be “guessed”. β = Pr{fail to reject | true}.The power of a test is .. This depends on many factors.

Please check BCC’s website for future lectures

https://osa.stonybrookmedicine.edu/research-core-facilities/bcc/education

Coming ones:

March 20, p-value & FDR

April 4, performing basic statistical tests using different software

THANK YOU!

https://osa.stonybrookmedicine.edu/research-core-facilities/bcc/education

Introduction to Inferential Statistics · 2019. 12. 30. · Introduction to Inferential Statistics...

Documents

Transcript of Introduction to Inferential Statistics · 2019. 12. 30. · Introduction to Inferential Statistics...