STAT 113 Inferential Statistics II

29
Reminder: Inference Goals Standard Error Confidence Intervals: Justification STAT 113 Inferential Statistics II Standard Error Oberlin College November 8-10, 2021 1 / 19

Transcript of STAT 113 Inferential Statistics II

Page 1: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

STAT 113Inferential Statistics II

Standard Error

Oberlin College

November 8-10, 2021

1 / 19

Page 2: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Outline

Reminder: Inference Goals

Standard Error

Confidence Intervals: Justification

2 / 19

Page 3: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Two Main Goals of Inference

1. Estimating unknown quantities in a population using a dataset (by reporting confidence intervals)

2. Assessing strength of evidence about “yes/no” questions(by carrying out hypothesis tests)

3 / 19

Page 4: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Variability due to Sampling

• Each potential dataset (sample) is animperfect/incomplete snapshot of the underlyingpopulation/process/phenomenon• Therefore, statistics are imperfect reflections of the

underlying parameters• However, if samples are representative, statistics areusually close to the corresponding parameter• So, we can estimate (with some, but not full certainty) that

the unknown underlying parameter is probably close to thecorresponding statistic

4 / 19

Page 5: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Self Check: Statistics and ParametersIs each of the following a statistic or a parameter?1. The mean temperature in a set of 1000 measurements taken

throughout 2020 at the Cleveland airport2. The mean temperature in Cleveland in 20203. The structural association between household income and

standardized test scores in the U.S.4. The correlation between household incomes and standardized

test scores in a dataset about college admissions5. The proportion of the time the home team won in NBA

basketball games in 20196. The size of the structural advantage associated with playing at

home in the NBA

5 / 19

Page 6: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Definition: Sampling Distribution

• Consider all possible datasets of a certain sample size, n,produced by taking a representative snapshot (sample) froma process/phenomenon/population.• Each one has its own value for a particular statistic (like the

mean of a certain variable).• A sampling distribution is the collection of values of all of

these statistics (such as sample means)• Note that this is a hypothetical/theoretical construction; we

almost never actually have more than onedataset/sample/statistic

6 / 19

Page 7: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Sample Distribution 6= Sampling Distribution

Sample Distribution 6= Sampling Distribution

Sample Distribution 6= Sampling Distribution

• The cases in a sample are individual observations• The cases in a sampling distribution are statistics (such as

means), each from a different potential dataset

7 / 19

Page 8: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

If the process produces a flavor-life distribution like this:

Process Mean = 66.8

55 60 65 70 75 80Flavor Life (minutes)

which could yield any of the following data setsSample Mean = 65.7

55 60 65 70 75 80Flavor Life (minutes)

Sample Mean = 65.9

55 60 65 70 75 80Flavor Life (minutes)

Sample Mean = 66.9

55 60 65 70 75 80Flavor Life (minutes)

then each potential set of 10 gumballs has a mean flavor life.The sampling distribution of all such potential means mightlook like this:

55 60 65 70 75 80Mean Flavor Life (minutes)

8 / 19

Page 9: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Demo: StatKey

http://lock5stat.com/statkey

9 / 19

Page 10: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Self Check: Sampling Distributions

7. Which of the following best characterizes what a single caseis in a sampling distribution where the parameter of interest isthe long-run winning percentage of the home team inNBA games?(a) An NBA game(b) Whether the home team won or lost in a given game(c) A dataset consisting of several NBA games(d) The long-run winning percentage of the home team in NBA

games

10 / 19

Page 11: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Self Check: Sampling Distributions

8. Which of the following best characterizes what the variable isin the above sampling distribution?(a) Whether the home team won or lost in a given game(b) The percentage of the time the home team won in a particular

dataset of NBA games(c) The long-run winning percentage of the home team in NBA

games(d) Whether or not the home team has a structural advantage in

the NBA

11 / 19

Page 12: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Outline

Reminder: Inference Goals

Standard Error

Confidence Intervals: Justification

12 / 19

Page 13: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Definition: Standard Error

• The distribution of a quantitative variable has a standarddeviation

• In the context of a sampling distribution, the cases arepossible datasets• The sample statistic is then a variable, since it characterizes

each possible dataset• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets

• The variability in the statistic across all possible datasetscan be summarized by its standard deviation.• In this particular context, the standard deviation has a special

name: the standard error (i.e., the standard deviation of thestatistic).

13 / 19

Page 14: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Definition: Standard Error

• The distribution of a quantitative variable has a standarddeviation• In the context of a sampling distribution, the cases arepossible datasets

• The sample statistic is then a variable, since it characterizeseach possible dataset• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets

• The variability in the statistic across all possible datasetscan be summarized by its standard deviation.• In this particular context, the standard deviation has a special

name: the standard error (i.e., the standard deviation of thestatistic).

13 / 19

Page 15: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Definition: Standard Error

• The distribution of a quantitative variable has a standarddeviation• In the context of a sampling distribution, the cases arepossible datasets• The sample statistic is then a variable, since it characterizes

each possible dataset

• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets

• The variability in the statistic across all possible datasetscan be summarized by its standard deviation.• In this particular context, the standard deviation has a special

name: the standard error (i.e., the standard deviation of thestatistic).

13 / 19

Page 16: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Definition: Standard Error

• The distribution of a quantitative variable has a standarddeviation• In the context of a sampling distribution, the cases arepossible datasets• The sample statistic is then a variable, since it characterizes

each possible dataset• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets

• The variability in the statistic across all possible datasetscan be summarized by its standard deviation.• In this particular context, the standard deviation has a special

name: the standard error (i.e., the standard deviation of thestatistic).

13 / 19

Page 17: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Definition: Standard Error

• The distribution of a quantitative variable has a standarddeviation• In the context of a sampling distribution, the cases arepossible datasets• The sample statistic is then a variable, since it characterizes

each possible dataset• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets• The variability in the statistic across all possible datasets

can be summarized by its standard deviation.

• In this particular context, the standard deviation has a specialname: the standard error (i.e., the standard deviation of thestatistic).

13 / 19

Page 18: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Definition: Standard Error

• The distribution of a quantitative variable has a standarddeviation• In the context of a sampling distribution, the cases arepossible datasets• The sample statistic is then a variable, since it characterizes

each possible dataset• In other words, a statistic that summarizes an individualdataset becomes a variable when we consider all possibledatasets• The variability in the statistic across all possible datasets

can be summarized by its standard deviation.• In this particular context, the standard deviation has a special

name: the standard error (i.e., the standard deviation of thestatistic).

13 / 19

Page 19: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

The flavor-lives of each of a set of individual gumballs has astandard devation

Process Mean = 66.8

Process SD = 2.8

55 60 65 70 75 80Flavor Life (minutes)

The mean flavor lives of each potential dataset of 10gumballs also have a standard devation

Mean of Means = 66.8

SD of Means = 0.9

55 60 65 70 75 80Mean Flavor Life (minutes)

The latter standard deviation is called the standard error of themean flavor life.

14 / 19

Page 20: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

The flavor-lives of each of a set of individual gumballs has astandard devation

Process Mean = 66.8

Process SD = 2.8

55 60 65 70 75 80Flavor Life (minutes)

The mean flavor lives of each potential dataset of 10gumballs also have a standard devation

Mean of Means = 66.8

SD of Means = 0.9

55 60 65 70 75 80Mean Flavor Life (minutes)

The latter standard deviation is called the standard error of themean flavor life.

14 / 19

Page 21: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

The flavor-lives of each of a set of individual gumballs has astandard devation

Process Mean = 66.8

Process SD = 2.8

55 60 65 70 75 80Flavor Life (minutes)

The mean flavor lives of each potential dataset of 10gumballs also have a standard devation

Mean of Means = 66.8

SD of Means = 0.9

55 60 65 70 75 80Mean Flavor Life (minutes)

The latter standard deviation is called the standard error of themean flavor life.

14 / 19

Page 22: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Self Check: Standard Error

7. A series of 100 measurements records temperatures at theCleveland airport. Which of the following best describes whatthe standard error of the mean temperature at theCleveland airport can tell us?(a) How variable the temperatures are in Cleveland(b) Whether our dataset consists of days and times that are

warmer than average or cooler than average(c) How much sampling bias exists in our data-collection procedure(d) How much measurement bias exists in our data-collection

procedure(e) How precise we can expect an estimate of the mean

temperature to be using our data-collection procedure

15 / 19

Page 23: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Outline

Reminder: Inference Goals

Standard Error

Confidence Intervals: Justification

16 / 19

Page 24: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Estimation Using the 95% Rule

In a bell-shaped distribution , most (about 95%) individualvalues are within 2 Standard Deviations of the mean .

17 / 19

Page 25: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Estimation Using the 95% Rule

In a bell-shaped sampling distribution of sample means,most (about 95%) individual sample means are within 2Standard Errors of the mean of sample means.

17 / 19

Page 26: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Estimation Using the 95% Rule

In a bell-shaped sampling distribution of sample means,most (about 95%) individual sample means are within 2Standard Errors of the mean of sample means.

If the samples are representative, then the mean ofsample means is the population/process mean; i.e., theparameter of interest

17 / 19

Page 27: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Estimation Using the 95% RuleIn a bell-shaped sampling distribution of sample means,most (about 95%) individual sample means are within 2Standard Errors of the mean of sample means.

If the samples are representative, then the mean ofsample means is the population/process mean; i.e., theparameter of interest

So, 95% of the time that I obtain one sample mean (frommy study/snapshot/dataset), the parameter of interest (thepopulation/process mean) is within 2 Standard Errorsof it.

17 / 19

Page 28: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Confidence Intervals: Construction

• So, if I have a sample statistic, and if I can find astandard error, I can estimate that the populationmean, µ, is between

x̄− 2SE and x̄+ 2SE

• This statement should be correct about 95% of the time.

18 / 19

Page 29: STAT 113 Inferential Statistics II

Reminder: Inference Goals Standard Error Confidence Intervals: Justification

Self Check: Confidence IntervalsA poll asked 500 registered voters to describe their dispositiontoward president Trump using a scale that goes from -5 to +5,where negative values indicate a net unfavorable disposition, andpositive values indicate a net favorable disposition. Suppose themean rating is -1.1, with a reported standard error of 0.4.8. What is the statistic here?9. What parameter is this statistic potentially well suited to

estimate?10. Give a range of values that the parameter is likely to fall within

19 / 19