Class 4: Tues., Sept. 21

17
Class 4: Tues., Sept. 21 • External/Internal Reliability Clarification • Regression Analysis Examples: – Appropriate Dating Ages – Father’s and son’s heights • Variability of Y given X in the Simple Linear Regression Model

description

Class 4: Tues., Sept. 21. External/Internal Reliability Clarification Regression Analysis Examples: Appropriate Dating Ages Father’s and son’s heights Variability of Y given X in the Simple Linear Regression Model. Reliability. - PowerPoint PPT Presentation

Transcript of Class 4: Tues., Sept. 21

Page 1: Class 4: Tues., Sept. 21

Class 4: Tues., Sept. 21

• External/Internal Reliability Clarification• Regression Analysis Examples:

– Appropriate Dating Ages– Father’s and son’s heights

• Variability of Y given X in the Simple Linear Regression Model

Page 2: Class 4: Tues., Sept. 21

Reliability• In general, a measurement is reliable if it gives consistent

results. • My distinction between internal/external reliability of a

measurement (e.g., a test) was not very precise. Here’s a better categorization.

• Four types of reliability for a measurement (degree of reliability can be measured by correlation): 1. Inter-observer: Different measurements of the same

object/information give consistent results (e.g., two psychiatrists rate the behavior of a patient similarly; two Olympic judges score a gymnastics contestant similarly).

Page 3: Class 4: Tues., Sept. 21

Types of Reliability Continued

2. Test-retest: Measurements taken at two different times are similar (e.g., a person’s pulse is similar for two different readings)

3. Parallel form: Two tests of different forms that supposedly test the same material give similar results (e.g., a person’s SAT scores are similar for two forms of the test).

4. Split-half: If the items on a test are divided in half (e.g., odd vs. even), the scores on the two halves are similar.

Page 4: Class 4: Tues., Sept. 21

Examples of Reliability

Example Type Correlation

Pulse Test-Retest 0.90

Bedtime on a Wed.

Test-Retest 0.52

SAT scores Parallel Form or Split Half (not clear)

0.91

Page 5: Class 4: Tues., Sept. 21

Regression Analysis

• Provides a model for the mean of Y given X=X0, E(Y|X=X0) and the variability of Y given X=X0. Useful for understanding the association between Y and X and for predicting Y based on X.

• Simple linear regression model: –

– has a normal distribution with mean 0 and standard deviation

0100)|( XXXYE iii XY 10

i

Page 6: Class 4: Tues., Sept. 21
Page 7: Class 4: Tues., Sept. 21

Example: What age is too young?• In U.S. culture, an older man dating a younger woman is not

uncommon but when the age difference becomes too large, it may seem to some be unacceptable.

• A survey was taken of ten people whom were each asked the minimum acceptable age for a woman to be dating a man of a certain age for a range of ages.

• Y=minimum acceptable age of woman dating man of X years of age.

X=age of man• What is the mean of people’s minimum acceptable for a woman

to be dating a man of X years of age, i.e., what is E(Y|X=X0)?

Page 8: Class 4: Tues., Sept. 21

B i v a r i a t e F i t o f M i n i m u m W o m a n ' s A g e B y M a n ' s A g e

1 0

1 5

2 0

2 5

3 0

3 5

4 0

4 5

5 0

5 5M

inim

um

Wo

ma

n's

Ag

e

2 0 3 0 4 0 5 0 6 0M a n 's A g e

L in e a r F it L i n e a r F i t M i n i m u m W o m a n ' s A g e = 5 . 4 7 2 0 3 7 + 0 . 5 7 5 3 5 1 8 M a n ' s A g e

Page 9: Class 4: Tues., Sept. 21

• Estimated Mean (among survey population) Minimum Acceptable Age for a Woman dating a man who is – 20 years old: 5.47+0.58*20 = 17.07– 30 years old: 5.47+0.58*30 = 22.87– 40 years old: 5.47+0.58*40 = 28.67– 50 years old: 5.47+0.58*50 =34.47– 60 years olds: 5.47+0.58*60=40.27 – 70 years old: 5.47+0.58*70 = 46.07

Linear FitMinimum Woman's Age = 5.472037 + 0.5753518 Man's Age

Page 10: Class 4: Tues., Sept. 21

Father and Son’s Height

• Y=Son’s Height, X=Father’s Height (Galton’s Data from 19th century England)

60

70

son

60 70father

Page 11: Class 4: Tues., Sept. 21

Simple Linear Regression Model for Height Data

B i v a r i a t e F i t o f s o n B y f a t h e r

6 0

7 0

so

n

6 0 7 0fa th e r

L in e a r F it L i n e a r F i t s o n = 3 3 . 8 8 6 6 0 4 + 0 . 5 1 4 0 9 3 f a t h e r S u m m a r y o f F i t R S q u a r e 0 . 2 5 1 3 4 R S q u a r e A d j 0 . 2 5 0 6 4 4 R o o t M e a n S q u a r e E r r o r 2 . 4 3 6 5 5 6 M e a n o f R e s p o n s e 6 8 . 6 8 4 0 7 O b s e r v a t i o n s ( o r S u m W g t s ) 1 0 7 8

Page 12: Class 4: Tues., Sept. 21

• Estimated regression model: E(Son’s height | Father’s Height ) = 33.89 + 0.51 *Father’s height

• Estimated slope = 0.51. For each additional inch of father’s height, the mean son’s height increases by 0.51 inches.

• Predicted son’s heights: – Father’s height = 60 inches. Predicted son’s height =

33.89 + 0.51 * 60 = 64.5 inches– Father’s height = 72 inches. Predicted son’s height =

33.89 + 0.51 * 72 = 70.6 inches

Page 13: Class 4: Tues., Sept. 21

Variability of Y given X• The simple linear regression model tells us more

than the mean of Y given X=X0, it tells us about the variability and distribution of Y given X=X0.

• Simple linear regression model: –

– has a normal distribution with mean 0 and standard deviation (SD)

– The subpopulation of Y with corresponding X=X0 has a normal distribution with mean and SD

iii XY 10

i

010 X

XX

Page 14: Class 4: Tues., Sept. 21

Residuals and Estimating • Estimating

– Use least squares to estimate the slope and intercept of the simple linear regression model. Denote the slope estimates by and the intercept estimate by

– Predicted value of Yi for observation i based on Xi and regression model estimate:

– Residual for observation i: Prediction error of using least squares line to predict Yi for observation i

– Root mean square error = (approximately) standard deviation of residuals. Root mean square error is an estimate of

• For father-son height data, root mean square error = 2.4. This means that, according to the simple linear regression model, a son whose father is 72 inches has a mean height of 33.89 + .51*72 = 70.6 inches with a standard deviation of 2.4 inches.

1̂0̂

ii XY 10ˆˆˆ

iii YYs ˆRe

Page 15: Class 4: Tues., Sept. 21

Normal Distribution

• About 68% of the observations from a normal distribution will fall within one standard deviation ( ) of the mean ( )

• About 95% of the observations from a normal distribution will fall within two standard deviations of the mean.

• About 99% of the observations will fall within three standard deviations of the mean.

Page 16: Class 4: Tues., Sept. 21

Variability of Y given X• According to the estimated regression model, the distribution of

heights for sons whose father are 72 inches is a normal distribution with a mean of 70.6 inches and a standard deviation of 2.4 inches.

• If a son’s father’s height is 72 inches,– 68% of the time the son’s height will be between inches– 95% of the time, the son’s height will be between inches 99% of the time, the son’s height will be between inches.

)0.73,2.68(4.26.70

)0.75,8.65(4.2*26.70

)8.77,4.63(3*4.26.70

Page 17: Class 4: Tues., Sept. 21

Summary• Regression model provides information about both the

mean of Y given X and the variability of Y given X.• For the simple linear regression model, the standard

deviation of Y given X is estimated by the root mean square error.

• For the simple linear regression model, approximately 68% of the time, Y given X will be within one root mean square error of the estimated mean of Y given X ( ), approximately 95% of the time, Y given X will be within two root mean square errors of the mean of Y given X. RMSEX 10

ˆˆ