1
Experimental Statistics - week 2Experimental Statistics - week 2
Review: 2-sample t-tests paired t-tests
Thursday: Meet in 15 Clements!! Bring Cody and Smith book
2
p-Value p-Value
(observed value of t)
-2.39
p-value
0 0 0 : : vs. aH H
0H t t Reject if
Suppose t = - 2.39 is observed from data for test above
Note: “Large negative values” of t make us believe alternative is true
the probability of an observation as extreme or more extreme than the one observed when the null is true
3
Note:Note:-- if p-value is less than or equal to then we reject null at the significance level
-- the p-value is the smallest level of significance at which the null hypothesis would be rejected
4
Find the p-values for Examples 1 and 2
5
6
Two Independent SamplesTwo Independent Samples
• Assumptions: Measurements from each population are
– Mutually Independent Independent within each sample
Independent between samples
– Normally distributed (or the Central Limit Theorem can be invoked)
• Analysis differs based on whether the 2 populations have the same standard deviation
7
Two CasesTwo Cases
• Population standard deviations equal– Can obtain a better estimate of the common
standard deviation by combining or “pooling” individual estimates
• Population standard deviations unequal– Must estimate each standard deviation
– Very good approximate tests are available
If Unsure, Do Not AssumeEqual Standard Deviations
8
Equal Population Standard Deviations
Equal Population Standard Deviations
Test Statistic
df = n1 + n2 - 2
nns
)μ(μ)yy( t=
p21
2121
11
s= s
+nn
sn + sn=s
pp
p
2
21
222
2112
2
)1()1(
where
9
Behrens-Fisher ProblemBehrens-Fisher Problem
y
2
22
1
21
2121 t~
ns
ns
)(y
1 2 If
10
Satterthwaite’s Approximate t Statistic
Satterthwaite’s Approximate t Statistic
y
1 t
ns
ns
)(y
2
22
1
21
212
1 2 If
2 2 21 2
2 21 2
1 2
( ), ,
1 1
a b s sa b
a b n nn n
df = (Approximate t df)
(i.e. approximate t)
11
Often-Recommended Strategy for Tests on Means
Often-Recommended Strategy for Tests on Means
Test whether 1 = 2 (F-test )– If the test is not rejected, use the 2-sample t statistics,
assuming equal standard deviations– If the test is rejected, use Satterthwaite’s approximate t
statistic
NOTE: This is Not a good strategy– the F-test is highly susceptible to non-normality
Recommended Strategy:– If uncertain about whether the standard deviations are
equal, use Satterthwaite’s approximate t statistic
12
Example 3: Comparing the Mean Breaking Strengths of 2 PlasticsExample 3: Comparing the Mean Breaking Strengths of 2 Plastics
Plastic A:
Plastic B:
.= , s.=y , = n AAA 3332835
Assumptions:Mutually independent measurementsNormal distributions for measurements from each type of plastic
.= , s.=y , = n AAA 9472640
Question:Question: Is there a difference between the 2 plastics in terms of mean breaking strength?
13
Example 3 - solution
14
15
Design:Design:
50 people: randomly assign 25 to go on diet and 25 to eat normally for next month.
Assess results by comparing weights at end of 1 month.
Diet: No Diet:Diet: No Diet:
D
D
X
SND
ND
X
S
Run 2-sample t-test using guidelines we have discussed.
Is this a good design?
New diet – Is it effective?New diet – Is it effective?
16
Better Design:Better Design:
Randomly select subjects and measure them before and after 1-month on the diet.
Subject Before After 1 150 147 2 210 195 : : :
n 187 190
Difference 3 15 :
-3
Procedure: Calculate differences, and analyze differences using a 1-sample test
““Paired t-Test”Paired t-Test”
17
Example 4: International Gymnastics Judging
Example 4: International Gymnastics Judging
Contestant 1 2 3 4 5 6 7 8 9 10 11 12Native J udge 6.8 4.5 8.0 7.2 8.7 4.5 6.6 5.8 6.0 8.8 8.7 4.4Foreign J udges 6.7 4.3 8.1 7.2 8.3 4.6 5.4 5.9 6.1 9.1 8.7 4.3
Question: Do judges from a contestant’s country rate their own contestant higher than do foreign judges?
0 : N FH i.e. test
:a N FH
Data:
18
Example 4 solution
19
Introduction to SAS Introduction to SAS Programming LanguageProgramming Language
21
Fertilizer Data
Brand 1 Brand 2 51.0 cm 54.0 cm 53.3 56.1 55.6 52.1 51.0 56.4 55.5 54.0 53.0 52.9 52.1
A researcher studies the effect of two fertilizer brands on the growth of plants. Thirteen plants grown under identical conditions except that 7 plants are randomly selected to receive Brand 1 and the remaining 6 are fertilized using Brand 2. The data for this experiment are as follows where the outcome measurement is the height of the plant after 3 weeks of growth (you may assume the heights to be normally distributed):
22
The Fertilizer data set as SAS needs to see it
A 51.0A 53.3A 55.6A 51.0A 55.5A 53.0A 52.1B 54.0B 56.1B 52.1B 56.4B 54.0B 52.9
23
Case 1: Data within SAS FILE : DATA one;INPUT brand$ height;DATALINES;A 51.0A 53.3 . . . B 54.0E 52.9 ;PROC TTEST; CLASS brand; VAR height; TITLE ‘Fertilizer Data – 2-sample t-test';RUN;
SAS file for FERTILIZER data
24
Brief Discussion of Components of the SAS File:
DATA Step
DATA STATEMENT - the first DATA statement names the data set whose variables are defined in the INPUT statement -- in the above, we create data set 'one'
INPUT STATEMENT - 2 forms
1. Freefield - can be used when data values are separated by 1 or more blanks
INPUT NAME $ AGE SEX $ SCORE; ($ indicates character variable)
2. Formatted - data occur in fixed columns
INPUT NAME $ 1-20 AGE 22-24 SEX $ 26 SCORE 28-30;
DATALINES STATEMENT - used to indicate that the next records in the file contain the actual data and the semicolon after the data indicates the end of the data itself
25
SPECIFYING THE ANALYSISSPECIFYING THE ANALYSIS -- PROC STATEMENTS
GENERAL FORM PROC xxxxx; implies procedure is to be run on most recently created data set PROC xxxxx DATA = data set name; Note: I did not have to specify DATA=one in the above example
Example PROCs:
PROC REG - regression analysisPROC ANOVA - analysis of variance PROC GLM - general linear model PROC MEANS - basic statistics, t-test for H0:
PROC PLOT - plottingPROC TTEST - t-tests PROC UNIVARIATE - descriptive stats, box-plots, etc.
PROC BOXPLOT - boxplots
26
PROC TTESTPROC TTEST
• Proc TTEST data = fn ;
Class … ; (specify the classification variable)
Var … / options; (specify the variable for which the means are compared)
Run;
27
SAS SyntaxSAS Syntax
• Every command MUSTMUST end with a semicolon– Commands can continue over two or more lines
• Variable names are 1-8 characters (letters and numerals, beginning with a letter or underscore), but no blanks or special characters
– Note: values for character variables can exceed 8 characters
• Comments – Begin with *, end with ;
28
Titles and LabelsTitles and Labels
• TITLE ‘…’ ;– Up to 10 title lines: TITLE ‘include your title here’;
– Can be placed in Data Steps or Procs
• LABEL name = ‘…’ ;– Can be in a DATA STEP or PROC PRINT
– Include ALL labels, then a single ;
Note: For class assignments, place descriptive titles and labels on the output.
29
Case 2: Data in External File :
FILENAME f1 ‘complete directory/file specification’;
FILENAME f1 ‘fertilizer.data';DATA one;INFILE f1; INPUT brand$ height;PROC TTEST; CLASS brand; VAR height; TITLE ‘Fertilizer Data – 2-sample t-test';RUN;
30
PC SAS on Campus
Library
BIC
Student Center
http://support.sas.com/rnd/le/index.html
SAS Learning Edition $125
Top Related