Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions...
-
Upload
reynard-barrett -
Category
Documents
-
view
223 -
download
0
Transcript of Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions...
Stat 31, Section 1, Last Time• Inference for Proportions
– Hypothesis Tests
• 2 Sample Proportions Inference
– Skipped
• 2-way Tables
– Sliced populations in 2 different ways
– Look for independence of factors
– Chi Square Hypothesis test
Reading In Textbook
Approximate Reading for Today’s Material:
Pages 582-611, 634-667
Approximate Reading for Next Class:
Pages 634-667
Midterm I - Results
Preliminary comments:
• Circled numbers are points taken off
• Total for each problem in brackets
• Points evenly divided among parts
• Page total in lower right corner
• Check those sum to total on front
• Overall score out of 100 points
Midterm I - Results
Interpretation of Scores:
• Too early for letter grades
• These will change a lot:
– Some with good grades will relax
– Some with bad grades will wake up
• Don’t believe “A & C” average to “B”
Midterm I - Results
Interpretation of Scores:
• Recall large variation over 2 midterms
– No exception this semester
Midterm I - ResultsCompare Midterm Scores
40
50
60
70
80
90
100
40 50 60 70 80 90 100
Midterm I
Mid
term
2 I
Midterm I - ResultsCompare Midterm Scores
40
50
60
70
80
90
100
40 50 60 70 80 90 100
Midterm I
Mid
term
2 I
Line ofEqualScores
Midterm I - ResultsCompare Midterm Scores
40
50
60
70
80
90
100
40 50 60 70 80 90 100
Midterm I
Mid
term
2 I
Some have DramaticallyImproved
Others haveBeen distractedBy other things
Midterm I - Results
Interpretation of Scores:
• Recall large variation over 2 midterms
– No exception this semester
• Get better info from 2 test Total
– So will report answers in those terms
Midterm I - Results
Histogram
of Results:
Midterm I + II, Total Score
0
2
4
6
8
10
12
14
Total Score
Fre
qu
en
cy
Midterm I - Results
Interpretation of Scores (2 Test total):
170 - 200 A
155 – 168 B
131 – 154 C
120 – 129 D
-- 119 F
Midterm I - Results
Where do we go from here?
• I see 2 rather different groups…
• Which are you in?
• What can you do?
• Most important:
It is still early days……
Chapter 9: Two-Way TablesMain idea:
Divide up populations in two ways– E.g. 1: Age & Sex– E.g. 2: Education & Income
• Typical Major Question:
How do divisions relate?
• Are the divisions independent?– Similar idea to indepe’nce in prob. Theory– Statistical Inference?
Two-Way TablesBig Question:Is there a
relationship?
Note: tallest bars French Wine French Music Italian Wine Italian Music Other Wine No MusicSuggests there is a relationship
NoneFrench
Italian
French Wine
Italian Wine
Other Wine
0
5
10
15
20
25
30
35
40
45
# Bottles purchased
Music
Class Example 31 - Counts
Two-Way TablesGeneral Directions:
• Can we make this precise?
• Could it happen just by chance?
– Really: how likely to be a chance effect?
• Or is it statistically significant?
– I.e. music and wine purchase are related?
Two-Way TablesAn alternate view:
Replace counts by proportions (or %-ages)
Class Example 31 (Wine & Music), Part 2http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Advantage:
May be more interpretable
Drawback:
No real difference (just rescaled)
Two-Way TablesTesting for independence:
What is it?
From probability theory:
P{A | B} = P{A}
i.e. Chances of A, when B is known, are same as when B is unknown
Table version of this idea?
Independence in 2-Way Tables
Counts analog of P{A|B}???
Equivalent condition for independence is:
So for counts, look for:
Table Prop’n = Row Marg’l Prop’n x Col’n Marg’l Prop’n
i.e. Entry = Product of Marginals
}{}{}&{ BPAPBAP
Independence in 2-Way TablesVisualize Product of Marginals for:
Class Example 31 (Wine & Music), Part 4http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Shows same structure
as marginals
But not match between
music & wine
Good null hypothesisNone
FrenchItalian
French Wine
Italian Wine
Other Wine
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
# Bottles purchased
Music
Class Example 31 - Independent Model
Independence in 2-Way TablesApproach:
• Measure “distance between tables”– Use Chi Square Statistic
– Has known probability distribution when table is independent
• Assess significance using P-value
– Set up as: H0: Indep. HA: Dependent
– P-value = P{what saw or m.c. | Indep.}
Independence in 2-Way TablesChi-square statistic: Based on:
• Observed Counts (raw data),
• Expected Counts (under indep.),
Notes:– Small for only random variation
– Large for significant departure from indep.
iObs
iExp
icells i
ii
ExpExpObs
X2
2
Independence in 2-Way TablesChi-square statistic calculation:
Class example 31, Part 5:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
– Calculate term by term
– Then sum
– Is X2 = 18.3 “big” or “small”?
icells i
ii
ExpExpObs
X2
2
Independence in 2-Way TablesH0 distribution of the X2 statistic:
“Chi Squared” (another Greek letter )
Parameter: “degrees of freedom”
(similar to T distribution)
Excel Computation:– CHIDIST (given cutoff, find area = prob.)
– CHIINV (given prob = area, find cutoff)
2
Independence in 2-Way TablesFor test of independence, use:
degrees of freedom =
= (#rows – 1) x (#cols – 1)
E.g. Wine and Music:
d.f. = (3 – 1) x (3 – 1) = 4
Independence in 2-Way TablesE.g. Wine and Music:
P-value = P{Observed X2 or m.c. | Indep.} =
= P{X2 = 18.3 of m.c. | Indep.} =
= P{X2 >= 18.3 | d.f. = 4} =
= 0.0011
Also see Class Example 31, Part 5http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Independence in 2-Way TablesE.g. Wine and Music:
P-value = 0.001
Yes-No: Very strong evidence against
independence, conclude music has a
statistically significant effect
Gray-Level: Also very strong
evidence
Independence in 2-Way TablesExcel shortcut:
CHITEST
• Avoids the (obs-exp)^2 / exp calculat’n
• Automatically computes d.f.
• Returns P-value
Independence in 2-Way TablesHW:
9.27
9.29
And Now for Something Completely Different
A statistics joke, from:
GARY C. RAMSEYER'S INTERNET GALLERY
OF STATISTICS JOKES
http://www.ilstu.edu/~gcramsey/Gallery.html
And Now for Something Completely Different
A somewhat advanced society has figured
how to package basic knowledge in pill
form.
A student, needing some learning, goes to
the pharmacy and asks what kind of
knowledge pills are available.
And Now for Something Completely Different
The pharmacist says "Here's a pill for
English literature."
The student takes the pill and swallows it
and has new knowledge about English
literature!
And Now for Something Completely Different
"What else do you have?" asks the student.
"Well, I have pills for art history, biology,
and world history, "replies the
pharmacist.
The student asks for these, and swallows
them and has new knowledge about
those subjects!
And Now for Something Completely Different
Then the student asks, "Do you have a pill for statistics?"
The pharmacist says "Wait just a moment", and goes back into the storeroom and brings back a whopper of a pill that is about twice the size of a jawbreaker and plunks it on the counter.
"I have to take that huge pill for statistics?" inquires the student.
And Now for Something Completely Different
The pharmacist understandingly nods his
head and replies:
"Well, you know statistics always was a little
hard to swallow."
Caution about 2-Way TablesSimpson’s Paradox:
Aggregation into tables can be dangerous
E.g. from:
http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/node50.html
Study Admission rates to professional programs, look for sex bias….
Simpson’s ParadoxAdmissions to Business School:
% Males ad’ted = 480 / (480 + 120) * 100%
= 80%
% Females ad’ted = 180 / (180 + 20)* 100%
= 90%
Better for females???
Admit Deny
Male 480 120
Female 180 20
Simpson’s ParadoxAdmissions to Law School:
% Males ad’ted = 10 / (10 + 90) * 100%
= 10%
% Females ad’ted = 100 / (100+200)*100%
= 33.3%
Better for females???
Admit Deny
Male 10 90
Female 100 200
Simpson’s ParadoxCombined Admissions:
% Males ad’ted = 490 / (490 + 210) * 100%
= 70%
% Females ad’ted = 280 / (280+210)*100%
= 56%
Better for males???
Admit Deny
Male 490 210
Female 280 220
Simpson’s ParadoxHow can the rate be higher for both females
and also males?Reason: depends on relative proportionsNotes:• In Business (male applicants
dominant), easier to get in(660 / 800)
• In Law (female applicants dominant), much harder to get in
(110 / 400)
Simpson’s ParadoxLesson:
Must be very careful about aggregation
Worse: may not be aware that aggregation has been done….
Recall terminology: Lurking Variable
Can hide in aggregation…
Could be used for cheating…
Simpson’s ParadoxHW:
9.15
9.17
Inference for RegressionChapter 10
Recall:
• Scatterplots
• Fitting Lines to Data
Now study statistical inference associated with fit lines
E.g. When is slope statistically significant?
Recall Scatterplot
For data (x,y)
View by plot:
(1,2)
(3,1)
(-1,0)
(2,-1)
Toy Scatterplot, Separate Points
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-2 -1 0 1 2 3 4
x
y
Recall Linear Regression
Idea:
Fit a line to data in a scatterplot
• To learn about “basic structure”
• To “model data”
• To provide “prediction of new values”
Recall Linear Regression
Recall some basic geometry:A line is described by an equation:
y = mx + b
m = slope m
b = y intercept b
Varying m & b gives a “family of lines”,Indexed by “parameters” m & b (or a & b)
Recall Linear Regression
Approach:
Given a scatterplot of data:
Find a & b (i.e. choose a line)
to “best fit the data”
),(),...,,( 11 nn yxyx
Recall Linear Regression
Given a line, , “indexed” by
Define “residuals” = “data Y” – “Y on line”
=
Now choose to make these “small”
),( 11 yx
abxy
)( abxy ii
),( 22 yx
),( 33 yx
ab&
ab&
Recall Linear Regression
Excellent Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html
More JAVA Demos, by David Lane at Rice U.http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html
http://www.ruf.rice.edu/~lane/stat_sim/comp_r/index.html
Recall Linear Regression
Make Residuals > 0, by squaring
Least Squares: adjust to
Minimize the “Sum of Squared Errors”
ab&
21
)(
n
iii abxySSE
Least Squares in Excel
Computation:
1. INTERCEPT (computes y-intercept a)
2. SLOPE (computes slope b)
Revisit Class Example 14http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg14.xls
HW: 10.17a
Inference for Regression
Goal: develop
• Hypothesis Tests and Confidence Int’s
• For slope & intercept parameters, a & b
• Also study prediction
Inference for Regression
Idea: do statistical inference on:
– Slope a
– Intercept b
Model:
Assume: are random, independent
and
iii ebaXY
ie
eN ,0
Inference for Regression
Viewpoint: Data generated as:
y = ax + b
Yi chosen from
Xi
Note: a and b are “parameters”
Inference for Regression
Parameters and determine the
underlying model (distribution)
Estimate with the Least Squares Estimates:
and
(Using SLOPE and INTERCEPT in Excel,
based on data)
a b
a b
Inference for Regression
Distributions of and ?
Under the above assumptions, the sampling
distributions are:
• Centerpoints are right (unbiased)
• Spreads are more complicated
a b
aaNa ,~ˆ
bbNb ,~ˆ
Inference for RegressionFormula for SD of :
• Big (small) for big (small, resp.)– Accurate data Accurate est. of slope
• Small for x’s more spread out– Data more spread More accurate
• Small for more data– More data More accuracy
a
n
ii
ea
xxaSD
1
2ˆ
e
Inference for RegressionFormula for SD of :
• Big (small) for big (small, resp.)– Accurate data Accur’te est. of intercept
• Smaller for – Centered data More accurate intercept
• Smaller for more data– More data More accuracy
b
n
ii
eb
xx
xn
bSD
1
2
21ˆ
e
0x
Inference for RegressionOne more detail:
Need to estimate using data
For this use:
• Similar to earlier sd estimate,
• Except variation is about fit line
• is similar to from before
e
2
ˆˆ1
2
n
bxays
n
iii
e
s
2n 1n
Inference for Regression
Now for Probability Distributions,
Since are estimating by
Use TDIST and TINV
With degrees of freedom =
e es
2n
Inference for RegressionConvenient Packaged Analysis in Excel:
Tools Data Analysis Regression
Illustrate application using:
Class Example 27,
Old Text Problem 8.6 (now 10.12)