Class 4: Tues., Sept. 21
description
Transcript of Class 4: Tues., Sept. 21
Class 4: Tues., Sept. 21
• External/Internal Reliability Clarification• Regression Analysis Examples:
– Appropriate Dating Ages– Father’s and son’s heights
• Variability of Y given X in the Simple Linear Regression Model
Reliability• In general, a measurement is reliable if it gives consistent
results. • My distinction between internal/external reliability of a
measurement (e.g., a test) was not very precise. Here’s a better categorization.
• Four types of reliability for a measurement (degree of reliability can be measured by correlation): 1. Inter-observer: Different measurements of the same
object/information give consistent results (e.g., two psychiatrists rate the behavior of a patient similarly; two Olympic judges score a gymnastics contestant similarly).
Types of Reliability Continued
2. Test-retest: Measurements taken at two different times are similar (e.g., a person’s pulse is similar for two different readings)
3. Parallel form: Two tests of different forms that supposedly test the same material give similar results (e.g., a person’s SAT scores are similar for two forms of the test).
4. Split-half: If the items on a test are divided in half (e.g., odd vs. even), the scores on the two halves are similar.
Examples of Reliability
Example Type Correlation
Pulse Test-Retest 0.90
Bedtime on a Wed.
Test-Retest 0.52
SAT scores Parallel Form or Split Half (not clear)
0.91
Regression Analysis
• Provides a model for the mean of Y given X=X0, E(Y|X=X0) and the variability of Y given X=X0. Useful for understanding the association between Y and X and for predicting Y based on X.
• Simple linear regression model: –
–
– has a normal distribution with mean 0 and standard deviation
0100)|( XXXYE iii XY 10
i
Example: What age is too young?• In U.S. culture, an older man dating a younger woman is not
uncommon but when the age difference becomes too large, it may seem to some be unacceptable.
• A survey was taken of ten people whom were each asked the minimum acceptable age for a woman to be dating a man of a certain age for a range of ages.
• Y=minimum acceptable age of woman dating man of X years of age.
X=age of man• What is the mean of people’s minimum acceptable for a woman
to be dating a man of X years of age, i.e., what is E(Y|X=X0)?
B i v a r i a t e F i t o f M i n i m u m W o m a n ' s A g e B y M a n ' s A g e
1 0
1 5
2 0
2 5
3 0
3 5
4 0
4 5
5 0
5 5M
inim
um
Wo
ma
n's
Ag
e
2 0 3 0 4 0 5 0 6 0M a n 's A g e
L in e a r F it L i n e a r F i t M i n i m u m W o m a n ' s A g e = 5 . 4 7 2 0 3 7 + 0 . 5 7 5 3 5 1 8 M a n ' s A g e
• Estimated Mean (among survey population) Minimum Acceptable Age for a Woman dating a man who is – 20 years old: 5.47+0.58*20 = 17.07– 30 years old: 5.47+0.58*30 = 22.87– 40 years old: 5.47+0.58*40 = 28.67– 50 years old: 5.47+0.58*50 =34.47– 60 years olds: 5.47+0.58*60=40.27 – 70 years old: 5.47+0.58*70 = 46.07
Linear FitMinimum Woman's Age = 5.472037 + 0.5753518 Man's Age
Father and Son’s Height
• Y=Son’s Height, X=Father’s Height (Galton’s Data from 19th century England)
60
70
son
60 70father
Simple Linear Regression Model for Height Data
B i v a r i a t e F i t o f s o n B y f a t h e r
6 0
7 0
so
n
6 0 7 0fa th e r
L in e a r F it L i n e a r F i t s o n = 3 3 . 8 8 6 6 0 4 + 0 . 5 1 4 0 9 3 f a t h e r S u m m a r y o f F i t R S q u a r e 0 . 2 5 1 3 4 R S q u a r e A d j 0 . 2 5 0 6 4 4 R o o t M e a n S q u a r e E r r o r 2 . 4 3 6 5 5 6 M e a n o f R e s p o n s e 6 8 . 6 8 4 0 7 O b s e r v a t i o n s ( o r S u m W g t s ) 1 0 7 8
• Estimated regression model: E(Son’s height | Father’s Height ) = 33.89 + 0.51 *Father’s height
• Estimated slope = 0.51. For each additional inch of father’s height, the mean son’s height increases by 0.51 inches.
• Predicted son’s heights: – Father’s height = 60 inches. Predicted son’s height =
33.89 + 0.51 * 60 = 64.5 inches– Father’s height = 72 inches. Predicted son’s height =
33.89 + 0.51 * 72 = 70.6 inches
Variability of Y given X• The simple linear regression model tells us more
than the mean of Y given X=X0, it tells us about the variability and distribution of Y given X=X0.
• Simple linear regression model: –
– has a normal distribution with mean 0 and standard deviation (SD)
– The subpopulation of Y with corresponding X=X0 has a normal distribution with mean and SD
iii XY 10
i
010 X
XX
Residuals and Estimating • Estimating
– Use least squares to estimate the slope and intercept of the simple linear regression model. Denote the slope estimates by and the intercept estimate by
– Predicted value of Yi for observation i based on Xi and regression model estimate:
– Residual for observation i: Prediction error of using least squares line to predict Yi for observation i
– Root mean square error = (approximately) standard deviation of residuals. Root mean square error is an estimate of
• For father-son height data, root mean square error = 2.4. This means that, according to the simple linear regression model, a son whose father is 72 inches has a mean height of 33.89 + .51*72 = 70.6 inches with a standard deviation of 2.4 inches.
1̂0̂
ii XY 10ˆˆˆ
iii YYs ˆRe
Normal Distribution
• About 68% of the observations from a normal distribution will fall within one standard deviation ( ) of the mean ( )
• About 95% of the observations from a normal distribution will fall within two standard deviations of the mean.
• About 99% of the observations will fall within three standard deviations of the mean.
Variability of Y given X• According to the estimated regression model, the distribution of
heights for sons whose father are 72 inches is a normal distribution with a mean of 70.6 inches and a standard deviation of 2.4 inches.
• If a son’s father’s height is 72 inches,– 68% of the time the son’s height will be between inches– 95% of the time, the son’s height will be between inches 99% of the time, the son’s height will be between inches.
)0.73,2.68(4.26.70
)0.75,8.65(4.2*26.70
)8.77,4.63(3*4.26.70
Summary• Regression model provides information about both the
mean of Y given X and the variability of Y given X.• For the simple linear regression model, the standard
deviation of Y given X is estimated by the root mean square error.
• For the simple linear regression model, approximately 68% of the time, Y given X will be within one root mean square error of the estimated mean of Y given X ( ), approximately 95% of the time, Y given X will be within two root mean square errors of the mean of Y given X. RMSEX 10
ˆˆ