Statistics for Everyone Workshop Fall 2010

47
Statistics for Everyone Workshop Fall 2010 Part 3 Comparing 2 Conditions With a T Test Workshop by Linda Henkel and Laura McSweeney of Fairfield University Funded by the Core Integration Initiative and the Center for Academic Excellence at Fairfield University

description

Part 3 Comparing 2 Conditions With a T Test. Statistics for Everyone Workshop Fall 2010. Workshop by Linda Henkel and Laura McSweeney of Fairfield University Funded by the Core Integration Initiative and the Center for Academic Excellence at Fairfield University. Teaching Tips - PowerPoint PPT Presentation

Transcript of Statistics for Everyone Workshop Fall 2010

Page 1: Statistics for Everyone Workshop  Fall 2010

Statistics for Everyone Workshop Fall 2010

Part 3

Comparing 2 Conditions With a T Test

Workshop by Linda Henkel and Laura McSweeney of Fairfield University

Funded by the Core Integration Initiative and the Center for Academic Excellence at Fairfield University

Page 2: Statistics for Everyone Workshop  Fall 2010

Statistics as a Tool in Scientific Research: Comparing 2 Conditions With a T Test

Teaching Tips

The research will excite them, but the statistics part might frighten them (“Oh, I’m not good at that…”) so ALWAYS emphasize statistics as a tool for researchers to use to answer research questions

Ask exciting and interesting research questions so that they want to find out the answer

Be sure to cover the science basics (e.g., terminology); If they don’t hear it from you, where do they hear it? Don’t assume they already know

Page 3: Statistics for Everyone Workshop  Fall 2010

Statistics as a Tool in Scientific Research: Comparing 2 Conditions With a T Test

Types of Research Questions

• Descriptive (What does X look like?)

• Correlational (Is there an association between X and Y? As X increases, what does Y do?)

• Experimental (Do changes in X cause changes in Y?)

Page 4: Statistics for Everyone Workshop  Fall 2010

Statistics as a Tool in Scientific Research

Start with the science and use statistics as a tool to answer the research question

Get your students to formulate a research question:

• How often does this happen?

• Did all plants/people/chemicals act the same?

• What happens when I add more sunlight, give more praise, pour in more water?

Page 5: Statistics for Everyone Workshop  Fall 2010

Types of Statistical Procedures

Descriptive: Organize and summarize data

Inferential: Draw inferences about the relations between variables;

use samples to generalize to the population

Page 6: Statistics for Everyone Workshop  Fall 2010

Hypothesis Testing in Experiments

Independent variable (IV) (manipulated): Has different levels or conditions

• Presence vs. absence (Drug, Placebo)• Amount (5mg, 10 mg, 20 mg) • Type (Drug A, Drug B, Drug C)

Quasi-Independent variable (not experimentally controlled) e.g., Gender

Dependent variable (DV) (measured variable):• Number of white blood cells, temperature, heart rate

Do changes in the levels/conditions of the IV cause changes in the DV?

If graphing, put IV on x-axis and DV on y-axis

Page 7: Statistics for Everyone Workshop  Fall 2010

Hypothesis Testing in Experiments

Statistical test: the t test

Used for: Comparing average score in one condition to average score in another condition

Use when: The IV is categorical (with 2 levels) and the DV is numerical (interval or ratio scale), e.g., weight as a function of gender,

# of white blood cells as a function of organism, mpg as a function of foreign/domestic

Page 8: Statistics for Everyone Workshop  Fall 2010

Types of T Tests: One Sample T

Use when you have one sample and you want to compare it to a known population

• Participants get Drug B. Clinical trials with Drug A have already shown how much it reduced pain. Does Drug B provide even better pain relief, on average?

• Plants are exposed to 3 hrs artificial light. You already know how many blooms, on average, plants will make when they have no artificial light. Does the artificial light cause more blooms on average than in the population at large?

• Hybrid cars are driven on a test course to determine their mpg. You already know the average mpg gas-powered cars get on this course. Do the hybrids get better or worse mpg on average than the population at large?

Page 9: Statistics for Everyone Workshop  Fall 2010

Types of T Tests: Independent Samples T

Use when you have a between-subjects design -- comparing if there is a difference between two separate (independent) groups

• Some people get the drug, and other people get the placebo. Which group on average has less pain?

• Some plants are exposed to 0 hrs of artificial light, and some are exposed to 3 hours. Which group has more blooms on average?

• Some cars are hybrid and some are gas-powered. Which type of car gets better fuel mileage on average?

Page 10: Statistics for Everyone Workshop  Fall 2010

Types of T Tests: Paired Samples T

Use when you have a within-subjects design – each subject experiences all levels/conditions ofthe IV; observations are paired/dependent/matched

• Each person gets both the drug and the placebo at different times. Which relieves pain better on average?

• Each plant is exposed to 0 hrs of artificial light at one time and 3 hours at another time. Which exposure time causes more blooms on average?

• Cars are filled with Fuel A at one time and Fuel B at another time. Which fuel gets better mpg on average?

Page 11: Statistics for Everyone Workshop  Fall 2010

Hypothesis Testing Using T Tests

The t test allows a scientist to determine whether their research hypothesis is supported

Null hypothesis H0:

• The IV does not influence the DV

• Any differences in average scores between the different conditions are probably just due to chance (measurement error, random sampling error)

Page 12: Statistics for Everyone Workshop  Fall 2010

Hypothesis Testing Using T Tests

Research hypothesis HA: • The IV does influence the DV• The differences in average scores

between the different conditions are probably not due to chance but show a real effect of the IV on the DV

Teaching tip: Very important for students to be able to understand and state the research question so that they see that statistics is a tool to answer that question

Page 13: Statistics for Everyone Workshop  Fall 2010

Hypothesis Testing Using T Tests

Null hypothesis: Average pain relief is the same whether people have Drug A or a placebo

Research hypothesis: Drug A brings better pain relief on average than the placebo (Upper tail test)

Null hypothesis: Plants exposed to 3 hours of artificial light have the same number of blooms on average as plants not exposed to artificial light

Research hypothesis: Plants exposed to 3 hours of artificial light have a different number of blooms on average than plants not exposed to artificial light (Two tailed test)

Null hypothesis: Cars filled with Fuel A get the same mpg as Fuel B on average

Research hypothesis: Cars filled with Fuel A get less mpg than Fuel B on average (Lower tail test)

Page 14: Statistics for Everyone Workshop  Fall 2010

Hypothesis Testing Using T Tests

Do the data/evidence support the research hypothesis or not?

Did the IV really influence the DV, or are the obtained differences in averages between conditions just due to chance?

Teaching tip: To increase your students’ understanding, you should be explicit about what researchers mean by “just by chance”…

Page 15: Statistics for Everyone Workshop  Fall 2010

Hypothesis Testing Using T Tests

p value = probability of results being due to chance

When the p value is high (p > .05), the obtained difference is probably due to chance

.99 .75 .55 .25 .15 .10 .07

When the p value is low (p < .05), the obtained difference is probably NOT due to chance and more likely reflects a real influence of the IV on DV

.04 .03 .02 .01 .001

Page 16: Statistics for Everyone Workshop  Fall 2010

Hypothesis Testing Using T Tests

p value = probability of results being due to chance

[Probability of observing your data (or more severe) if H0 were true]

When the p value is high (p > .05), the obtained difference is probably due to chance

[Data likely if H0 were true]

.99 .75 .55 .25 .15 .10 .07

When the p value is low (p < .05), the obtained difference is probably NOT due to chance and more likely reflects a real influence of the IV on DV

[Data unlikely if H0 were true, so data support HA]

.04 .03 .02 .01 .001

Page 17: Statistics for Everyone Workshop  Fall 2010

Hypothesis Testing Using T Tests

In science, a p value of .05 is a conventionally accepted cutoff point for saying when a result is more likely due to chance or more likely due to a real effect

Not significant = the obtained difference is probably due to chance; the IV does not appear to have a real influence on the DV; p > .05

Statistically significant = the obtained difference is probably NOT due to chance and is likely due to a real influence of the IV on DV; p < .05

Page 18: Statistics for Everyone Workshop  Fall 2010

Types of T Tests

All 3 t tests in essence answer the same question:

• Is the research hypothesis supported?

• In other words, did the IV really influence the DV, or are the obtained differences between conditions just due to chance?

One sample t test

Independent samples t test

Paired samples t test

Page 19: Statistics for Everyone Workshop  Fall 2010

The Essence of a T Test

They answer this question by calculating a t score

The t score basically examines how large the difference between the average score in each condition is, relative to how far spread out you would expect scores to be just based on chance (i.e., if there really was no effect of the IV on the DV)

It is of course more sophisticated than that but students will have to wait for statistics courses to really get the low down on how it all works

Page 20: Statistics for Everyone Workshop  Fall 2010

The Essence of a T Test

This can be taught without formulas but for the sake of thoroughness…

All formulas take the basic format:

Difference between means

Standard Error (SE)

Remember: When you sample from a population, you won’t get the same exact mean every time. SE = how far you would expect sample means from the same population to vary due to chance

Page 21: Statistics for Everyone Workshop  Fall 2010

Formulas for T Tests

YX MM

YXYX

SD

)()MM(t

M

X

S

Mt

One-sample t test: difference between sample mean and population mean divided by estimated standard error of the mean

Independent samples t test: difference between sample means divided by standard error of the difference between means

Related samples t test: mean of the difference scores – expected mean difference divided by standard error of the mean difference scores

DM

DD

SD

Mt

Page 22: Statistics for Everyone Workshop  Fall 2010

The Essence of a T Test

Each t test gives you a t score, which can be positive or negative; It’s the absolute value that matters

The bigger the |t| score, the less likely the difference between conditions is just due to chance

The bigger the |t| score, the more likely the difference between conditions is due to a real effect of the IV on the DV

So big values of |t| will be associated with small p values that indicate the differences are significant (p < .05)

Little values of |t| (i.e., close to 0) will be associated with larger p values that indicate the differences are not significant (p > .05)

Page 23: Statistics for Everyone Workshop  Fall 2010

Each t test will also have a particular value for degrees of freedom (df), which is based on the sample size (N)

Excel will calculate this for you, based on the sample size. The test statistic and df are both needed to calculate the p-value.

Test df formula What it measures

One sample t N – 1 total number of subjects minus 1

Independent

samples t

(Pooled)

(N1 – 1)+(N2 – 1) (number of subjects in Cond. 1) minus 1 + (number of subjects in Cond. 2) minus 1

Paired samples t N – 1 total number of subjects minus 1 * Remember, they are the same subjects in both conditions so do not count them twice

Page 24: Statistics for Everyone Workshop  Fall 2010

What Software to Use to Analyze Your Data?

If you have raw data, you can use Excel or SPSS

If the data has already been summarized (e.g., you do not have the raw data but just have group means and SDs), you need to use Excel

Page 25: Statistics for Everyone Workshop  Fall 2010

Running the One-Sample T Test Using ExcelOne Sample t Test: Use when you have one sample

and you want to compare it to a known population

Need: • Descriptive statistics: Sample M and SD • Known population mean (to compare sample to)

To run: Open Excel file “SFE Statistical Tests” and go to page called One-sample t test

Enter population mean (mu or ), sample size (N), sample mean (M), and standard deviation (SD)

Output: Computer calculates t value, df, and p value

Page 26: Statistics for Everyone Workshop  Fall 2010

Running the Independent Samples T Test Using Excel

Independent samples t Test: Use when you have a between-subjects design

Need: • Descriptive statistics: M, SD, and N for each condition

To run: Open Excel file “SFE Statistical Tests” and go to page called Independent samples t test

Pooled: If population SDs are equalNot Pooled: If population SDs are not equal

Enter for each condition the sample size (N), Mean (M), and standard deviation (SD)

Output: Computer calculates t value, df, and p value

Page 27: Statistics for Everyone Workshop  Fall 2010

Running the Paired Samples T Test Using Excel

Paired samples t Test: Use when you have a within-subjects design

Need: Descriptive statistics: Difference scores for the paired data (X-Y); M and SD for the difference scores

To run: Open Excel file “SFE Statistical Tests” and go to page called Paired samples t test

Enter the sample size (N), Mean (M) and standard deviation (SD) of the difference scores

Output: Computer calculates t value, df, and p value

Page 28: Statistics for Everyone Workshop  Fall 2010

Running the One-Sample T Test Using SPSSOne Sample t Test: Use when you have one sample

and you want to compare it to a known population

Need: • Raw numerical data in a column• Known population mean (to compare sample to)

To run: Open SPSS; Analyze Compare means One sample t test; send the column with your raw data into the “Test variable” box; Under “Test value” enter the known population mean, hit Ok

Output: Computer calculates t value, df, and p value

Page 29: Statistics for Everyone Workshop  Fall 2010

Running the Independent Samples T Test Using SPSS

Independent samples t Test: Use when you have a between-subjects design

Need: Raw numerical data set up in two columns (IV; DV); make sure

you use value labels under IV to indicate what the #s stand for (e.g., 1=male, 2=female)

To run:

Analyze compare means independent samples t test

Test var: DV Grouping var: IV (condition)

Define groups Group 1: 1 Group 2: 2, ok

Output: Computer calculates t value, df, and p value

Page 30: Statistics for Everyone Workshop  Fall 2010

Running the Paired Samples T Test Using SPSS

Paired samples t Test: Use when you have a within-subjects design

Need: Set up data file as two columns listing raw numerical data: Cond 1, Cond 2 [but give meaningful names to]

To run:

Analyze compare means paired samples t test; send pair over

Output: Computer calculates t value, df, and p value

Page 31: Statistics for Everyone Workshop  Fall 2010

Interpreting T Tests

Cardinal rule: Scientists do not say “prove”! Conclusions are based on probability (likely due to chance, likely a real effect…). Be explicit about this to your students.

Based on p value, determine whether you have evidence to conclude the difference was probably real or was probably due to chance: Is the research hypothesis supported?

p < .05: Significant• Reject null hypothesis and support research hypothesis (the

difference was probably real; the IV likely influences the DV)

p > .05: Not significant• Retain null hypothesis and reject research hypothesis (any

difference was probably due to chance; the IV did not influence the DV)

Page 32: Statistics for Everyone Workshop  Fall 2010

Teaching TipsStudents have trouble understanding what is less than .05 and what is

greater, so a little redundancy will go a long way!

Whenever you say “p is less than point oh-five” also say, “so the probability that this is due to chance is less than 5%, so it’s probably a real effect.”

Whenever you say “p is greater than point oh-five” also say, “so the probability that this is due to chance is greater than 5%, so there’s just not enough evidence to conclude that it’s a real effect – these 2 conditions are not really different”

In other words, read the p value as a percentage, as odds, “the odds that this difference is due to chance are 1%, so it’s probably not chance…”

Relate your phrasing back to the IV and DV: “So the IV likely caused changes in the DV…”; “The IV worked – these 2 conditions are different”

Page 33: Statistics for Everyone Workshop  Fall 2010

Deciding to Use One-tailed or Two-tailed T Tests

Different disciplines have different conventions

The conservative “rule” in some disciplines is to use a two-tailed test, regardless of whether you had a directional or nondirectional hypothesis

• Directional: You expect scores in one condition to be higher than the other• Nondirectional: You expect a difference between conditions but do not

specify ahead of time whether A should be greater than B or less than B

If using two-tailed, instruct your students to look for the p value on the line that says “two tailed” in the Excel worksheet

Page 34: Statistics for Everyone Workshop  Fall 2010

Deciding to Use One-tailed or Two-tailed T Tests

Example: For the two sample tests (both paired and independent tests)

Null Hypothesis: 1 = 2

Research Hypothesis: 1 2 (Two tail test) or1 > 2 (Upper tail test) or1 < 2 (Lower tail test)

Where 1 is the mean of the first population and2 is the mean of the second population

Page 35: Statistics for Everyone Workshop  Fall 2010

Reporting T Tests Results

State key findings in understandable sentences

Use descriptive and inferential statistics to supplement verbal description by putting them in parentheses and at the end of the sentence

Use a table and/or figure to illustrate findings

Page 36: Statistics for Everyone Workshop  Fall 2010

Reporting T Tests Results

Step 1: Write a sentence that clearly indicates what statistical analysis you used

A [type of t test] t test was conducted to determine whether [name of DV] varied as a function of [name of IV or name of conditions]

A one-sample t test was conducted to determine whether the pollen count varied as a function of location (on campus or in the state of CT in general)

An independent samples t test was conducted to determine whether people’s pulse rates varied as a function of their weight (obese people or underweight).

A paired samples t test was conducted to determine whether calories consumed by rats varied as a function of group size (rats tested alone, tested in groups).

Page 37: Statistics for Everyone Workshop  Fall 2010

Reporting T Tests ResultsStep 2: Write a sentence that clearly indicates what pattern you saw in your

data analysis – Did the conditions differ, and if so, how (i.e., which condition scored higher or lower?)

Average [Name of DV] was significantly higher/lower for [name of Condition 1] than the average for [name of Condition 2]

[Name of DV] was significantly higher/lower on average for [name of Condition 1] than for [name of Condition 2]

[Name of DV] did not significantly differ between [name of Condition 1] and [name of Condition 2] on average

The pollen count was significantly lower on average on college campuses than in the state of CT in general. (Lower tailed test)

Pulse rates did not significantly differ on average whether people were obese or underweight. (Two tailed test)

The number of calories on average consumed by rats was significantly higher when the rats ate alone than when they ate in groups. (Upper tailed test)

Page 38: Statistics for Everyone Workshop  Fall 2010

Teaching Tip

The key is to emphasize that these should be nice, easy-to-understand grammatical sentences that do not sound like “Me Tarzan, you Jane!”

You may want your students to explicitly note whether the research hypothesis was supported or not.

“Results supported the hypothesis that increased dosages of the drug would reduce the average number of white blood cells…”

Page 39: Statistics for Everyone Workshop  Fall 2010

Reporting T Tests ResultsStep 3: Tack the descriptive and inferential statistics onto the

sentence

• Put Ms and SDs in parentheses after the name of the condition. (Possibly include Ns for two sample tests.)

• Put the t test results at the end of the sentence using this format: t(df) = x.xx, p = .xx

The pollen count was significantly lower on average on college campuses (M=8.35, SD=2.25) than in the state of CT in general (=9.85), t(12)= -2.40, p = .02.

Pulse rates did not significantly differ on average whether people were obese (M=95 beats per minute, SD=21, N = 24) or underweight (M=91 beats per minute, SD=18, N = 23), t(45)= 0.70, p = .49.

The number of calories on average consumed by rats was significantly higher when the rats ate alone (M=100.34, SD = 12.64) than when they ate in groups (M=87.65, SD = 13.43), t(22) = 2.38, p = .01.

Page 40: Statistics for Everyone Workshop  Fall 2010

Reporting T Tests Results for Nonsignificant DifferenceTeaching tip: Make sure your students understand that when the

difference is not significant, they should NOT word their sentences to imply that there is a difference

Not significant = no difference

One mean will no doubt be higher than the other, but if it’s not a significant difference, then the difference is probably not real, so do not interpret a direction (Saying “this is higher but no it’s really not” is silly)

Option 1: Simply say there was no significant differencePulse rates did not significantly differ on average whether people were obese (M=95

beats per minute, SD=21, N = 24) or underweight (M=91 beats per minute, SD=18, N = 23), t(45)= 0.70, p = .49.

Option 2: Word the sentence so that it is clear that the hypothesized difference was not obtained

Counter to hypotheses, pulse rates were not significantly higher on average for people who were obese (M=95 beats per minute, SD=21, N = 24) than for people who were underweight (M=91 beats per minute, SD=18, N = 23), t(45)= 0.70, p = .49.

Page 41: Statistics for Everyone Workshop  Fall 2010

Reporting T Tests Results

• Be sure that you note the unit of measure for the DV (miles per gallon, volts, seconds, #, %). Be very specific

• If using a Table or Figure showing M & SDs or SEs, you do not necessarily have to include those descriptive statistics in your sentences

• If you have more than 2 conditions, run t tests for

each pair of conditions you want to compare (e.g., Cond. 1 vs. 2, Cond 1 vs. 3, Cond. 2 vs. 3)

Page 42: Statistics for Everyone Workshop  Fall 2010

More Teaching Tips

You can ask your students to report either:• the exact p value (p = .03, p = .45) • the cutoff: say either p < .05 (significant) or p > .05 (not

significant)

You should specify which style you expect. Ambiguity confuses them!

Tell students they can only use the word “significant” only when they mean it (i.e., the probability the results are due to chance is less than 5%) and to not use it with adjectives (i.e., they often mistakenly think one test can be “more significant” or “less significant” than another). Emphasize that “significant” is a cutoff that is either met or not met -- Just like you are either found guilty or not guilty, pregnant or not pregnant. There are no gradients. Lower p values = less likelihood result is due to chance, not “more significant”

Page 43: Statistics for Everyone Workshop  Fall 2010

Effect Size for T Test

When a result is found to be significant

(p < .05), many researchers report the effect size as well

Effect size = How large the difference in scores was

Significant = Was there a real difference or not?

Page 44: Statistics for Everyone Workshop  Fall 2010

Effect Size for T Test

Effect size: How much did the IV influence the DV?

Cohen’s d = |mean difference|

SD

Small 0 - .20

Medium .21 - .80

Large > .80

Page 45: Statistics for Everyone Workshop  Fall 2010

Reporting T Tests Results****THIS STEP IS OPTIONAL***

Step 4: Report the effect size if the t test was significant

After the t test results are reported, say whether it was a small, medium, or large effect size, and report Cohen’s d

The pollen count was significantly lower on average on college campuses (M=8.35, SD=2.25) than in the state of CT in general (=9.85), t(12)= -2.40, p = .02, and the effect size was medium, Cohen’s d = .67

Pulse rates did not significantly differ on average whether people were obese (M=95 beats per minute, SD=21, N = 24) or underweight (M=91 beats per minute, SD=18, N = 23), t(45)=.70, p = .49. No reason to report effect size because t test was not significant.

The number of calories on average consumed by rats was significantly higher when the rats ate alone (M=100.34, SD = 12.64) than when they ate in groups (M=87.65, SD = 13.43), t(22) = 2.38, p = .01, and the effect size was large, Cohen’s d = .97.

Page 46: Statistics for Everyone Workshop  Fall 2010

Check Assumptions for T Test

• Numerical scale (interval or ratio) for DV

• The distribution of scores in each sample is approximately symmetrical (normal) thus the mean is an appropriate measure of central tendency

• If a distribution is somewhat skewed (not symmetric) it is still acceptable to run the the test as long as the sample size is not too small (say, N > 30)

Page 47: Statistics for Everyone Workshop  Fall 2010

Time to Practice

• Running and reporting t tests