Introduction to Statistical Inference

Introduction to Introduction to Statistical Statistical InferenceInference

EDUC 502EDUC 502

November 28, 2005November 28, 2005

Statistical Inference in Statistical Inference in EducationEducation

Illuminating article: Illuminating article:

Daniel, L.G. (1998). Statistical Daniel, L.G. (1998). Statistical significance testing: A historical significance testing: A historical overview of misuse and overview of misuse and misinterpretation with implications misinterpretation with implications for editorial policies of educational for editorial policies of educational journals. journals. Research in the SchoolsResearch in the Schools, , 55 (2), 23-32. Available online: (2), 23-32. Available online: http://www.personal.psu.edu/users/dhttp://www.personal.psu.edu/users/d/m/dmr/sigtest/3mspdf.pdf/m/dmr/sigtest/3mspdf.pdf


““Probably few methodological issues Probably few methodological issues have generated as much controversy have generated as much controversy among sociobehavioral scientists as the among sociobehavioral scientists as the use of [statistical significance] tests” use of [statistical significance] tests” (Pedhazur & Schmelkin, 1991, p. 198).(Pedhazur & Schmelkin, 1991, p. 198).

““The test of significance does not provide The test of significance does not provide the information concerning psychological the information concerning psychological phenomena characteristically attributed phenomena characteristically attributed to it…a great deal of mischief has been to it…a great deal of mischief has been associated with its use” (Bakan, 1966, p. associated with its use” (Bakan, 1966, p. 423). 423).


Huberty (1987) asserted, “There is Huberty (1987) asserted, “There is nothing wrong with statistical tests nothing wrong with statistical tests themselves! When used as guides and themselves! When used as guides and indicators, as opposed to means of indicators, as opposed to means of arriving at definitive answers, they are arriving at definitive answers, they are Okay” (p. 7). Okay” (p. 7).

Main problem: “The ingenuous Main problem: “The ingenuous assumption that a statistically significant assumption that a statistically significant result is necessarily a noteworthy result” result is necessarily a noteworthy result” (Daniel, 1997, p. 106). (Daniel, 1997, p. 106).


Another problem: It is “common Another problem: It is “common practice to drop the word ‘statistical’ practice to drop the word ‘statistical’ and instead speak of ‘significant and instead speak of ‘significant differences,’ ‘significant correlations,’ differences,’ ‘significant correlations,’ and the like” (Pedhazur & Schmelkin, and the like” (Pedhazur & Schmelkin, 1991, p. 202). 1991, p. 202).

Schafer (1993) noted, “I hope that most Schafer (1993) noted, “I hope that most researchers understand that researchers understand that significantsignificant (statistically) and (statistically) and importantimportant are two are two different things. Surely the term different things. Surely the term significantsignificant was ill-chosen” (p. 387). was ill-chosen” (p. 387).


In order to better understand this In order to better understand this controversy, we will explore some of controversy, we will explore some of the mathematics behind statistical the mathematics behind statistical inference. inference.

We will follow the outline provided We will follow the outline provided by:by:

Moore, D.S. (1997). Moore, D.S. (1997). Statistics: Statistics: Concepts and controversiesConcepts and controversies (4 (4thth ed.). ed.). New York: W.H. Freeman. New York: W.H. Freeman.


Inference simply means drawing conclusions Inference simply means drawing conclusions from data, as we have discussed up to this from data, as we have discussed up to this point. point.

The phrase “statistical inference” is reserved The phrase “statistical inference” is reserved for occasions when probability concepts are for occasions when probability concepts are used to help in drawing conclusions. used to help in drawing conclusions.

Probability can account for chance variation, Probability can account for chance variation, which allows us to correct our judgment of which allows us to correct our judgment of what is happening in certain situations. what is happening in certain situations.


Scenario: Suppose a multiple choice Scenario: Suppose a multiple choice test is used to compare the performance test is used to compare the performance of students receiving teaching method A of students receiving teaching method A to teaching method B. 20 students were to teaching method B. 20 students were assigned at random to teaching method assigned at random to teaching method A and another 20 to teaching method B. A and another 20 to teaching method B. At the end of the experiment, 12 of the At the end of the experiment, 12 of the students in group A received Fs on the students in group A received Fs on the test while only 8 in group B received Fs. test while only 8 in group B received Fs.

Question: Can we conclude that Question: Can we conclude that teaching method B better prevents teaching method B better prevents students from receiving Fs?students from receiving Fs?


Answer: Not necessarily. A difference Answer: Not necessarily. A difference this size could likely be due to chance this size could likely be due to chance variation alone. We could do a variation alone. We could do a probability calculation to compute the probability calculation to compute the probability of avoiding an F just by probability of avoiding an F just by guessing and then compare. guessing and then compare.

While there is a numerical difference While there is a numerical difference between the number of Fs in the two between the number of Fs in the two groups, that difference might vanish if groups, that difference might vanish if the experiment were repeated a number the experiment were repeated a number of times. of times.


Drawing conclusions in mathematics: Drawing conclusions in mathematics: Start with a hypothesis and then use a Start with a hypothesis and then use a logical argument to prove that the logical argument to prove that the conclusion follows. conclusion follows.

Example: If a quadrilateral is a Example: If a quadrilateral is a rectangle, then its diagonals are rectangle, then its diagonals are congruent. (This can be proven through congruent. (This can be proven through an a priori logical argument – not by an a priori logical argument – not by just examining a bunch of rectangles just examining a bunch of rectangles and measuring their diagonals to see if and measuring their diagonals to see if they are congruent). they are congruent).


Drawing conclusions in social science is Drawing conclusions in social science is almost the opposite of mathematics: You almost the opposite of mathematics: You need to start with a number of need to start with a number of observations and draw conclusions from observations and draw conclusions from them. (them. (InductiveInductive reasoning). reasoning).

Important implication: Social science Important implication: Social science research studies do NOT produce research studies do NOT produce proofsproofs. . They only produce They only produce evidenceevidence that something that something may or may not be the case. (i.e., you can may or may not be the case. (i.e., you can never never proveprove that teaching method A is that teaching method A is better than method B, but you can better than method B, but you can systematically gather evidence to help you systematically gather evidence to help you make decisions about how to teach). make decisions about how to teach).


““Statistical inference uses probability Statistical inference uses probability to say how strong an inductive to say how strong an inductive argument is” (Moore, 1997, p. 459). argument is” (Moore, 1997, p. 459).

In the teaching method A vs. teaching In the teaching method A vs. teaching method B scenario, a probability method B scenario, a probability calculation could help us see that the calculation could help us see that the argument in favor of teaching method argument in favor of teaching method B is not very strong. We could likely B is not very strong. We could likely get different results if the experiment get different results if the experiment were replicated a number of times. were replicated a number of times.

Statistical Inference in Statistical Inference in EducationEducation Note: The probability calculations Note: The probability calculations

required for statistical inference required for statistical inference depend upon probability samples or depend upon probability samples or randomized comparative experiments.randomized comparative experiments.

Very few educational research studies Very few educational research studies have this sort of luxury, with a few have this sort of luxury, with a few notable exceptions. For example:notable exceptions. For example: National Assessment of Educational National Assessment of Educational

Progress (NAEP)Progress (NAEP) Trends in Mathematics and Science Study Trends in Mathematics and Science Study

(TIMSS)(TIMSS)

Some Essential Some Essential TerminologyTerminology

““A A parameterparameter is a number that describes is a number that describes the population. For example, the the population. For example, the proportion of the population having some proportion of the population having some characteristic of interest is a parameter characteristic of interest is a parameter we call we call pp. In a statistical inference . In a statistical inference problem, population parameters are fixed problem, population parameters are fixed numbers, but we do not know their numbers, but we do not know their values” (Moore, 1997, p. 460).values” (Moore, 1997, p. 460).

Example: The actual proportion of 3Example: The actual proportion of 3rdrd graders who can read in the U.S. is a graders who can read in the U.S. is a population population parameterparameter. We can only . We can only estimate it by drawing random samples estimate it by drawing random samples from the population. We will probably from the population. We will probably never know it never know it exactlyexactly..

Some Essential Some Essential TerminologyTerminology

“ “ A A statisticstatistic is a number the describes the is a number the describes the sample data. For example, the proportion sample data. For example, the proportion of the sample having some characteristic of of the sample having some characteristic of interest is a statistic the we call p-hat. interest is a statistic the we call p-hat. Statistics change from sample to sample. Statistics change from sample to sample. We use the observed statistics to get We use the observed statistics to get information about the unknown information about the unknown parameters” (Moore, 1997, p. 460).parameters” (Moore, 1997, p. 460).

Example: We could draw a random sample Example: We could draw a random sample out of all the 3out of all the 3rdrd graders in the U.S. and graders in the U.S. and administer a literacy test. The proportion administer a literacy test. The proportion that could read would be a that could read would be a statisticstatistic to to estimate the population estimate the population parameterparameter..

Confidence IntervalsConfidence Intervals Scenario: “The NAEP survey includes a Scenario: “The NAEP survey includes a

short test of quantitative skills, covering short test of quantitative skills, covering mainly basic arithmetic and the ability to mainly basic arithmetic and the ability to apply it to realistic problems. Scores on apply it to realistic problems. Scores on the test range from 0 to 500. For the test range from 0 to 500. For example, a person who scores 233 can example, a person who scores 233 can add the amounts of two checks appearing add the amounts of two checks appearing on a bank deposit slip; someone scoring on a bank deposit slip; someone scoring 325 can determine the price of a meal 325 can determine the price of a meal from a menu, a person scoring 375 can from a menu, a person scoring 375 can transform a price in cents per ounce into transform a price in cents per ounce into dollars per pound” (Moore, 1997b, p. dollars per pound” (Moore, 1997b, p. 207). 207).

Confidence IntervalsConfidence Intervals

Scenario (contd).: “In a recent year, 840 Scenario (contd).: “In a recent year, 840 men 21 to 25 years of age were in the men 21 to 25 years of age were in the NAEP sample. Their mean quantitative NAEP sample. Their mean quantitative score was 272 (score was 272 (statisticstatistic). These 840 men ). These 840 men are a simple random sample from the are a simple random sample from the population of all young men. On the basis population of all young men. On the basis of this sample, what can we say about of this sample, what can we say about the mean score in the population of all the mean score in the population of all 9.5 million young men of these ages 9.5 million young men of these ages ((parameterparameter)?” (Moore, 1997b, p. 207). )?” (Moore, 1997b, p. 207).


Because the statistic was 272, you Because the statistic was 272, you might guess the actual population might guess the actual population parameter is around 272.parameter is around 272.

Statistical Inference question related to Statistical Inference question related to confidence intervals: “How would the confidence intervals: “How would the sample mean (sample mean (statisticstatistic) vary if we took ) vary if we took many samples of 840 young men from many samples of 840 young men from this same population?” (Moore, 1997b, this same population?” (Moore, 1997b, p. 207). p. 207).

This seems like an impossible question This seems like an impossible question to answer on the face of it, but some to answer on the face of it, but some statistical facts help us out.statistical facts help us out.

Confidence IntervalsConfidence Intervals Useful fact #1: The sampling Useful fact #1: The sampling

distribution for sample means is distribution for sample means is normallynormally distributed! distributed!

Useful fact #2: The mean of the Useful fact #2: The mean of the sampling distribution is equal to the sampling distribution is equal to the mean of the population. mean of the population.

Useful fact #3: The 68-95-99.7 rule for Useful fact #3: The 68-95-99.7 rule for normal distributions. normal distributions.

Useful fact #4: From long experience, Useful fact #4: From long experience, we calculate the standard deviation of we calculate the standard deviation of the sampling distribution to be 2.1. the sampling distribution to be 2.1.

Confidence IntervalsConfidence Intervals Putting the facts together: The 68-95-Putting the facts together: The 68-95-

99.7 rule says that about 95% of the 99.7 rule says that about 95% of the means will be within two standard means will be within two standard deviations of the population mean. In our deviations of the population mean. In our case, 95% of the sample means will be case, 95% of the sample means will be within 4.2 points of the population mean. within 4.2 points of the population mean.

In 95% of all samples taken, the actual In 95% of all samples taken, the actual population mean is within 4.2 points of population mean is within 4.2 points of the sample mean. the sample mean.

This means that in 95% of all samples This means that in 95% of all samples the actual population mean lies between the actual population mean lies between (sample mean) – 4.2 and (sample mean) (sample mean) – 4.2 and (sample mean) + 4.2+ 4.2


Bottom line: If we choose very many Bottom line: If we choose very many samples, 95% of the intervals defined samples, 95% of the intervals defined by (sample mean) plus or minus (4.2) by (sample mean) plus or minus (4.2) will capture the actual population will capture the actual population mean. mean.

Back to the NAEP scenario: Recall that Back to the NAEP scenario: Recall that our sample mean was 272. This means our sample mean was 272. This means we can say that we are 95% confident we can say that we are 95% confident that the actual population mean for the that the actual population mean for the NAEP lies between: 272-4.2 = NAEP lies between: 272-4.2 = 267.8267.8 and 272+4.2 = 276.2.and 272+4.2 = 276.2.


““Be sure you understand the grounds for Be sure you understand the grounds for our confidence. There are only two our confidence. There are only two possibilities:possibilities: 1. The interval between 267.8 and 276.2 1. The interval between 267.8 and 276.2

contains the true population mean.contains the true population mean. 2. Our simple random sample was one of the 2. Our simple random sample was one of the

few samples for with the sample mean is not few samples for with the sample mean is not within 4.2 points of the true population within 4.2 points of the true population mean. Only 5% of all samples give such mean. Only 5% of all samples give such inaccurate results” (Moore, 1997, p. 210).inaccurate results” (Moore, 1997, p. 210).


““We cannot know whether our sample is We cannot know whether our sample is one of the 95% for which the interval one of the 95% for which the interval catches the actual population mean, or catches the actual population mean, or one of the unlucky 5%. one of the unlucky 5%.

The statement that we are 95% confident The statement that we are 95% confident that the actual population mean lies that the actual population mean lies between 267.8 and 276.2 is shorthand for between 267.8 and 276.2 is shorthand for saying, ‘We got these numbers by a saying, ‘We got these numbers by a method that gives correct results 95% of method that gives correct results 95% of the time” (Moore, 1997b, p. 210). the time” (Moore, 1997b, p. 210).

Homework Exercise 1Homework Exercise 1

““The report of a sample survey of 1500 The report of a sample survey of 1500 adults says, ‘With 95% confidence, adults says, ‘With 95% confidence, between 27% and 33% of American between 27% and 33% of American adults believe that drugs are the most adults believe that drugs are the most serious problem facing our nation’s serious problem facing our nation’s public schools.’ Explain to someone public schools.’ Explain to someone who knows no statistics what the who knows no statistics what the phrase ‘ninety-five percent confidence’ phrase ‘ninety-five percent confidence’ means in this report” (Moore, 1997, p. means in this report” (Moore, 1997, p. 468). 468).


““A student reads that a 95% confidence A student reads that a 95% confidence interval for the mean NAEP interval for the mean NAEP quantitative score for men of ages 21 quantitative score for men of ages 21 to 25 is 267.8 to 276.2. Asked to to 25 is 267.8 to 276.2. Asked to explain the meaning of this interval, explain the meaning of this interval, the student says, ‘ninety-five percent of the student says, ‘ninety-five percent of all young men have scores between all young men have scores between 267.8 and 276.2.’ Is this student right? 267.8 and 276.2.’ Is this student right? Justify your answer” (Moore, 1997b, p. Justify your answer” (Moore, 1997b, p. 217). 217).

Hypothesis TestsHypothesis Tests ““The other major type of formal inference The other major type of formal inference

is the is the test of significancetest of significance. The purpose of . The purpose of a statistical test is to assess the evidence a statistical test is to assess the evidence provided by the data against some claim provided by the data against some claim about a parameter. A test says, ‘If we about a parameter. A test says, ‘If we took many samples and the claim were took many samples and the claim were true, we would rarely get a result like true, we would rarely get a result like this.’ Observing a result that would rarely this.’ Observing a result that would rarely occur if a claim were true is evidence that occur if a claim were true is evidence that the claim is not true. Replace the word the claim is not true. Replace the word ‘rarely’ by a probability and you have a ‘rarely’ by a probability and you have a numerical measure of our confidence in numerical measure of our confidence in the evidence that the data give us” the evidence that the data give us” (Moore, 1997, p. 483). (Moore, 1997, p. 483).

Hypothesis TestsHypothesis Tests

Generic Example: Suppose we want to Generic Example: Suppose we want to compare a new teaching method (A) compare a new teaching method (A) against another one (B). We might start against another one (B). We might start by guessing that teaching method A will by guessing that teaching method A will work better.work better.

We would then state a null and We would then state a null and alternative hypothesis: Null – Mean alternative hypothesis: Null – Mean posttest scores for the two groups will posttest scores for the two groups will be identical. Alternative: Mean posttest be identical. Alternative: Mean posttest scores for group A will be greater than scores for group A will be greater than group B. group B.

Hypothesis TestsHypothesis Tests

If we believe in teaching method A, we If we believe in teaching method A, we hope to gather evidence against the null hope to gather evidence against the null hypothesis and in support of the hypothesis and in support of the alternative.alternative.

If we gather enough evidence (If we gather enough evidence (enoughenough and and significancesignificance being defined in being defined in probabilistic terms), we can reject the probabilistic terms), we can reject the null hypothesis. Note, however, that this null hypothesis. Note, however, that this does not does not proveprove the alternative the alternative hypothesis. All that any sociological hypothesis. All that any sociological study can do is to study can do is to gather evidencegather evidence..


Suppose you read in an educational Suppose you read in an educational research report that students’ research report that students’ posttest scores after receiving posttest scores after receiving teaching method A were significantly teaching method A were significantly higher than those of students who higher than those of students who received teaching method B. Does received teaching method B. Does this prove that teaching method A is this prove that teaching method A is more effective than teaching method more effective than teaching method B? Why or why not? B? Why or why not?

Introduction to Statistical Inference

Documents

Transcript of Introduction to Statistical Inference