Variance and Standard Deviation (3) Frequency Distributions.
-
Upload
abel-copeland -
Category
Documents
-
view
221 -
download
1
Transcript of Variance and Standard Deviation (3) Frequency Distributions.
Variance and Variance and Standard Deviation (3)Standard Deviation (3)
Frequency DistributionsFrequency Distributions
Standard Deviation Standard Deviation = (xi - x)2
n
Standard Deviation can more conveniently be written
… this makes manual calculations much simpler
sx = xi2
- nx2 xi2
- x2
n nor
Total visitsVisits to doctors Frequency calculation total
0 3 0 x 3 01 4 1 x 4 42 4 2 x 4 83 5 3 x 5 154 1 4 x 1 45 2 5 x 2 106 1 6 x 1 6
TOTAL 20 47
Frequency Distributions Visits to the doctors:0 1 5 2 1 5 3 3 2 4 3 6 2 3 1 0 1 0 3 2
Mean = 47 / 20 = 2.35
Boys Girlscm cm cm cm cm cm cm cm cm cm
23 32 1 32 13 10 16 30 246 5 2 23 3 31 16 26 207 2 3 32 20 15 18 33 2
18 12 7 25 6 1 4 27 131 6 8 32 32 11 19 15 218 8 32 15 23 14 18 12 267 10 2 32 16 14 12 14 183 19 32 17 11 20 3 25 48 2 26 16 11 7 11 10 191 6 3 10 13 10 6 17 18
13 17 23 1 25 26 211 10 28 3 16 15 311 8 23 14 17 15 2710 4 24 15 12 12 2510 2 7 19 4 1 1923 23 8 22 29 17 287 13 6 12 5 30 8
21 18 11 14 10 16 1832 19 9 10 18 17 2332 9 12 4 34 20 19
Ruler Experiment
Ruler Experiment - Mean Boys
Reaction (cm) Tally Frequency Mid Point Calculation Total
0 <= R < 5 17 2.5 2.5 x 17 42.5
5 <= R < 10 24 7.5 7.5 x 24 180
10 <= R < 15 16 12.5 12.5 x 16 200
15 <= R < 20 10 17.5 17.5 x 10 175
20 <= R < 25 11 22.5 22.5 x 11 247.5
25 <= R < 30 2 27.5 27.5 x 2 55
30 <= R < 35 10 32.5 32.5 x 10 325
Total 90 1225
Mean 13.6
Girls
Reaction (cm) Tally Frequency Mid Point Calculation Total
0 <= R < 5 10 2.5 2.5 x 10 25
5 <= R < 10 7 7.5 7.5 x 7 52.5
10 <= R < 15 14 12.5 12.5 x 14 175
15 <= R < 20 21 17.5 17.5 x 21 367.5
20 <= R < 25 6 22.5 22.5 x 6 135
25 <= R < 30 9 27.5 27.5 x 9 247.5
30 <= R < 35 3 32.5 32.5 x 3 97.5
Total 70 1100
Mean 15.7
Estimate of Mean = total (based on mean) total frequency
Mean within Frequency Distributions
Within frequency distribution, mean is defined as ...
x = xifi
n
fi means frequency
Where data is provided in ranges, the xi value are the mid-point in the range.
It represents an estimate of the mean, since it assumes that values are evenly distributed in the range
Standard Deviation with Frequency Distributions
sx = xi2
- x2
n
Previously, we arrived at the formula:
… now x = xifi
nfi means frequency
The xi2
part can also be calculated from the tables
BoysMid-Point Squared
Reaction (cm) Tally Frequency Mid Point
Mid-Point Squared x Frequency
0 <= R < 5 17 2.5 6.25 106.25
5 <= R < 10 24 7.5 56.25 1350
10 <= R < 15 16 12.5 156.25 2500
15 <= R < 20 10 17.5 306.25 3062.5
20 <= R < 25 11 22.5 506.25 5568.75
25 <= R < 30 2 27.5 756.25 1512.5
30 <= R < 35 10 32.5 1056.25 10562.5
Total 90 Sum 24662.5
Girls
Mid-Point Squared
Reaction (cm) Tally Frequency Mid Point
Mid-Point Squared x Frequency
0 <= R < 5 10 2.5 6.25 62.5
5 <= R < 10 7 7.5 56.25 393.75
10 <= R < 15 14 12.5 156.25 2187.5
15 <= R < 20 21 17.5 306.25 6431.25
20 <= R < 25 6 22.5 506.25 3037.5
25 <= R < 30 9 27.5 756.25 6806.25
30 <= R < 35 3 32.5 1056.25 3168.75
Total 70 Sum 22087.5
Standard Deviation with Frequency Distributions
… with frequency distribution, it becomes
sx = xi2
- x2
n
sx = xi2fi - x2
n
Previously, we arrived at the formula:
… where x = xifi
n
fi means frequency
BoysMid-Point Squared
Reaction (cm) Tally Frequency Mid Point
Mid-Point Squared x Frequency
0 <= R < 5 17 2.5 6.25 106.25
5 <= R < 10 24 7.5 56.25 1350
10 <= R < 15 16 12.5 156.25 2500
15 <= R < 20 10 17.5 306.25 3062.5
20 <= R < 25 11 22.5 506.25 5568.75
25 <= R < 30 2 27.5 756.25 1512.5
30 <= R < 35 10 32.5 1056.25 10562.5
Total 90 Sum 24662.5
Girls
Mid-Point Squared
Reaction (cm) Tally Frequency Mid Point
Mid-Point Squared x Frequency
0 <= R < 5 10 2.5 6.25 62.5
5 <= R < 10 7 7.5 56.25 393.75
10 <= R < 15 14 12.5 156.25 2187.5
15 <= R < 20 21 17.5 306.25 6431.25
20 <= R < 25 6 22.5 506.25 3037.5
25 <= R < 30 9 27.5 756.25 6806.25
30 <= R < 35 3 32.5 1056.25 3168.75
Total 70 Sum 22087.5
Boys: xi2fi = 24662.5
sx = xi2fi - x2
n
Girls: xi2fi = 22087.5
X = 15.7
X = 13.6 n = 90
n = 70
sx = 24662.5 - 13.62
90
sx = 22087.5 - 15.72
70
sx = 9.44
sx = 8.31
sx = xi2fi - x2
n
Mean = 47 / 20 = 2.35
sx = 167 - 2.352
20
= (8.35 – 5.5225) = 2.8275 = 1.68
x Total visitsVisits to doctors Frequency calculation total x squared freq times x squared
0 3 0 x 3 0 0 01 4 1 x 4 4 1 42 4 2 x 4 8 4 163 5 3 x 5 15 9 454 1 4 x 1 4 16 165 2 5 x 2 10 25 506 1 6 x 1 6 36 36
TOTAL 20 47 167
The right average? In a 5 person office:
The boss makes 50KThe 2 secretaries make 14KThe sales rep makes 25KThe trainee sales rep gets 16K
The median pay is 14, 14, 16, 25, 50
16KThe modal pay is 14K
The mean pay is 119K 5 = 23.8K
… which represents the ‘best average’ ?
The boss says “on average you earn over 23K in my office”
The sales rep says “on average you only get 16K in my office”
BoysReaction (cm) Tally Frequency
0 <= R < 5 17
5 <= R < 10 24
10 <= R < 15 16
15 <= R < 20 10
20 <= R < 25 11
25 <= R < 30 2
Missed 10
Total 90
Girls
Reaction (cm) Tally Frequency
0 <= R < 5 9
5 <= R < 10 7
10 <= R < 15 14
15 <= R < 20 18
20 <= R < 25 6
25 <= R < 30 9
Missed 7
Total 70
Boys Girlscm cm cm cm cm cm cm cm cm cm
23 miss 1 miss 13 10 16 30 246 5 2 23 3 miss 16 26 207 2 3 miss 20 15 18 miss 218 12 7 25 6 1 4 27 131 6 8 miss miss 11 19 15 miss8 8 miss 15 23 14 18 miss miss7 10 2 miss 16 14 12 14 miss3 19 miss 17 11 20 3 25 218 2 26 16 11 7 11 10 261 6 3 10 13 10 6 17 1213 17 23 1 25 26 211 10 28 3 16 15 311 8 23 14 17 15 2710 4 24 15 12 12 2510 2 7 19 4 1 1923 23 8 22 29 17 287 13 6 12 5 30 821 18 11 14 10 16 18
miss 19 9 10 18 17 23miss 9 12 4 miss 20 19
Suppose this had been our experiment
Cannot calculate a mean and standard deviation, since not all data value are known
Mean, median, spread?
Reaction Distaces
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35
cm
Cu
mu
lati
ve
Fre
qu
en
cy
Boys
Girls
You can still estimate the median and inter-quartile ranges
Median boy = 11 cmMedian girl = 17 cm
90 boys tested
70 girls tested
Reaction Distaces
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35
cm
Cu
mu
lati
ve
Fre
qu
en
cy
Boys
Girls
You can still estimate the median and inter-quartile ranges
90 boys tested
70 girls tested
Boy IQR = 20 - 6 = 14 cmGirl IQR = 25 - 11 = 14 cm
Pros and Cons of different averages (mean and median)and
measures of spread (inter-quartile range and standard deviation)
•Median and inter-quartile range are unaffected by extreme
values
•therefore the most suitable measures when extreme value
occur
•Median and inter-quartile range can be calculated with some
data missing (in the end ranges)
•Mean and standard deviation include all values
•Mean and standard deviation are more ‘sensitive’ measures
•they provide a better picture of the whole data
•You can therefore chose the values that bias the interpretation
in you favour!
"There are three kinds of lies: lies,
damned lies
and statistics.”;
Mark Twain
ActivityActivityPage 29 of your Statistics 1 Page 29 of your Statistics 1 book. Read and make a book. Read and make a memory mapmemory map
Time, t Frequency Time, tCumulative Frequency
0 < t <= 20 6 t <= 20 620 < t <= 40 13 t <= 40 1940 < t <= 60 23 t <= 60 4260 < t <= 80 18 t <= 80 6080 < t <= 100 10 t <= 100 70100 < t <= 120 6 t <= 120 76120 < t <= 140 4 t <= 140 80
Puzzle Solving - cumulative frequency
0
10
20
30
40
50
60
70
80
90
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
time t
cum
ula
tive
fre
qu
ency
CF
Median= 58 cm
IQR =80 - 41= 39 cm
Puzzle Solving - cumulative frequency
0
10
20
30
40
50
60
70
80
90
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160
time t
cu
mu
lati
ve f
req
uen
cy
CF
Time, t Frequency Time, tCumulative Frequency
0 < t <= 20 6 t <= 20 620 < t <= 40 13 t <= 40 1940 < t <= 60 23 t <= 60 4260 < t <= 80 18 t <= 80 6080 < t <= 100 10 t <= 100 70100 < t <= 120 6 t <= 120 76120 < t <= 150 4 t <= 150 80
Median= 58 cm
IQR =80 - 41= 39 cm
Median and IQR are unaffected by a change in the upper range
Time, t Frequency Mid point Freq x Mid-pt
0 < t <= 20 6 10 60
20 < t <= 40 13 30 390
40 < t <= 60 23 50 1150
60 < t <= 80 18 70 1260
80 < t <= 100 10 90 900
100 < t <= 120 6 110 660
120 < t <= 140 4 130 520
TOTAL 80 4940
Estimate of mean = 4940 / 80 = 61.75 sec
sx = xi2fi - x2
nMid-Point Squared
Time, t Frequency Mid point Mid-Point Squared x Frequency
0 < t <= 20 6 10 3600 21600
20 < t <= 40 13 30 152100 1977300
40 < t <= 60 23 50 1322500 30417500
60 < t <= 80 18 70 1587600 28576800
80 < t <= 100 10 90 810000 8100000
100 < t <= 120 6 110 435600 2613600
120 < t <= 140 4 130 270400 1081600
TOTAL 80 72788400
sx = 72788400 -
61.752
80
X = 61.75
= 951.9
sx = xi2fi - x2
n
sx = 72873200 -
62.002
80
X = 62.00 (61.75)
= 952.4 (951.9)
Mid-Point Squared
Time, t Frequency Mid point Freq x Mid-pt Mid-Point Squared x Frequency
0 < t <= 20 6 10 60 3600 21600
20 < t <= 40 13 30 390 152100 1977300
40 < t <= 60 23 50 1150 1322500 30417500
60 < t <= 80 18 70 1260 1587600 28576800
80 < t <= 100 10 90 900 810000 8100000
100 < t <= 120 6 110 660 435600 2613600
120 < t <= 150 4 135 540 291600 1166400
TOTAL 80 4960 72873200
Mean 62.00SD 952.4
Mean and SD are changed slightly by a change in the upper range