1 Standardization of variables Maarten Buis 5-12-2005.
-
Upload
clarence-lambert -
Category
Documents
-
view
216 -
download
0
Transcript of 1 Standardization of variables Maarten Buis 5-12-2005.
1
Standardization of variables
Maarten Buis
5-12-2005
2
Recap
• Central tendency
• Dispersion
• SPSS
3
Standardization
• Is used to improve interpretability of variables.
• Some variables have a natural interpretable metric: e.g. income, age, gender, country.
• Others, primarily ordinal variables, do not: e.g. education, attitude items, intelligence.
• Standardizing these variables makes them more interpretable.
4
Standardization
• Transforming the variable to a comparable metric– known unit
– known mean
– known standard deviation
– known range
• Three ways of standardizing:– P-standardization (percentile scores)
– Z-standardization (z-scores)
– D-standardization (dichotomize a variable)
5
When you should always standardize
• When averaging multiple variables, e.g. when creating a socioeconomic status variable out of income and education.
• When comparing the effects of variables with unequal units, e.g. does age or education have a larger effect on income?
6
P-Standardization
• Every observation is assigned a number between 0 and 100, indicating the percentage of observation beneath it.
• Can be read from the cumulative distribution
• In case of knots: assign midpoints• The median, quartiles, quintiles, and deciles
are special cases of P-scores.
7
rent cum % percentileroom 1 175 5,3% 5,3%room 2 180 10,5% 10,5%room 3 185 15,8% 15,8%room 4 190 21,1% 21,1%room 5 200 26,3% 26,3%room 6 210 31,6% 36,8%room 7 210 36,8% 36,8%room 8 210 42,1% 36,8%room 9 230 47,4% 47,4%room 10 240 52,6% 55,3%room 11 240 57,9% 55,3%room 12 250 63,2% 65,8%room 13 250 68,4% 65,8%room 14 280 73,7% 73,7%room 15 300 78,9% 81,6%room 16 300 84,2% 81,6%room 17 310 89,5% 89,5%room 18 325 94,7% 94,7%room 19 620 100,0% 100,0%
8
P-standardization
• Turns the variable into a ranking, i.e. it turns the variable into a ordinal variable.
• It is a non-linear transformation: relative distances change
• Results in a fixed mean, range, and standard deviation; M=50, SD=28.6, This can change slightly due to knots
• A histogram of a P-standardized variable approximates a uniform distribution
9
Linear transformation
• Say you want income in thousands of guilders instead of guilders.
• You divide INCMID by f1000,-
M SD
Incmid ƒ2543,- ƒ1481,-
Incmid/1000 kƒ2,543 kƒ1,481
10
Linear transformation
• Say you want to know the deviation from the mean
• Subtract the mean (f2543,-) from INCMID
M SD
Incmid ƒ2543,- ƒ1481,-
Incmid-M ƒ0,- ƒ1481,-
11
Recap: multiplication and addition and the number line
12
Linear transformation
• Adding a constant (X’ = X+c)– M(X’) = M(X)+c
– SD(X’) = SD(X)
• Multiply with a constant (X’ = X*c)– M(X’) = M(X)*c
– SD(X’) = SD(X) * |c|
13
Z-standardization
• Z = (X-M)/SD• two steps:
– center the variable (mean becomes zero)– divide by the standard deviation (the unit becomes
standard deviation)
• Results in fixed mean and standard deviation: M=0, SD=1
• Not in a fixed range!• Z-standardization is a linear transformation:
relative distances remain intact.
14
Z-standardization
• Step 1: subtract the mean
• c = -M(X)
• M(X’) = M(X)+c
• M(X’) = M(X)-M(X)=0
• SD(X’)=SD(X)
15
Z-standardization
• Step 2: divide by the standard deviation
• c is 1/SD(X)
• M(Z) = M(X’) * c
• M(Z) = 0 * 1/SD(X) = 0
• SD(Z) = SD(X’) * c
• SD(Z) = SD(X) * 1/SD(X) = 1
16
Normal distribution• Normal distribution = Gauss curve = Bell
curve• Formula (McCall p. 120)
– Note the (x-)2 part– apart from that all you have to remember is that
the formula is complicated
• Normal distribution occurs when a large number of small random events cause the outcome: e.g. measurement error
17
Normal distribution
• Other examples the height of individuals, intelligence, attitude
• But: the variables Education, Income and age in Eenzaam98 are not normally distributed
18
Z-scores and the normal distribution
• Z-standardization will not result in a normally distributed variable
• Standardization in NOT the same as normalization• We will not discuss normalization (but it does
exist)• But: If the original distribution is normally
distributed, than the z-standardized variable will have a standard normal distribution.
19
Standard normal distribution
• Normal distribution with M=0 and SD=1.
• Table A in Appendix 2 of McCall
• Important numbers (to be remembered):– 68% of the observations lie between ± 1 SD– 90% of the observations lie between ± 1.64 SD– 95% of the observations lie between ± 1.96 SD– 99% of the observations lie between ± 2.58 SD
20
Why bother?• If you know:
– That a variable is normally distributed– the mean and standard deviation
• Than you know the percentage of observations above or below and observation
• These numbers are a good approximation, even if the variable is not exactly normally distributed
21
P & Z standardization
• Both give a distribution with fixed mean, standard deviation, and unit
• P-standardization also gives a fixed range
• Both are relative to the sample: if you take observations out, than you have to re-compute the standardized variables
22
P & Z-standardization
• When interpreting Z-standardized variables one uses percentiles
• With P-standardization one decreases the scale of measurement to ordinal, BUT this improves interpretability.
23
Student recap
24
Do before Wednesday
• Read McCall chapter 5
• Understand Appendix 2, table A
• make exercises 5.7-5.28