Chapters 2 and 3 : Frequency Distributions, Histograms,Percentiles and Percentile Ranks and their Graphical Representations
Note: we’ll be skipping book sections:
2.4 (apparent and real limits) 2.8, 2.9 (percentile and percentile ranks for grouped data)
Chapter 2: Frequency Distributions, Histograms,Percentiles and Percentile Ranks
How can we represent or summarize a list of values?
frequency distribution: shows the number of observations for the possible categories or score values in a set of data. Can be done on any scale (nominal, ordinal, interval, or ratio).Often represented as a bar graph (Chapter 3).
Example of a frequency distribution for nominal scale data:
2008 Auto sales by country:
Japan: 11,563,629China: 9,345,101US: 8,705,239Germany: 6,040,582South Korea: 3,806,682Brazil: 3,220,475
Japan China US Germany South Korea Brazil0
2
4
6
8
10
12
Car
Sal
es in
200
8 (m
illio
ns)
Japan: 11,563,629
China: 9,345,101
US: 8,705,239
Germany: 6,040,582
South Korea: 3,806,682
Brazil: 3,220,475
Car sales drawn as a histogram
Distribution of all M&M's.
This histogram shows the proportion of members for each category.
We need to bin the raw scores into a set of class intervals. How do we decide these class intervals?
Be sure the intervals don’t overlap, have the same width, and cover the entire range of scores.
Use around 10 to 20 intervals.
Use a ‘sensible’ width (like 5, and not 2.718285)
Make the lower score a multiple of the width (e.g. if the width is 5, a lower score should be 50, not 48)
If a score lands on the border, put it in the lower class interval.
Ice Dancing , compulsory dance scores, 4 Winter
Olympics
111.15108.55
106.6103.33100.06
97.3896.6796.1292.7589.6285.3684.5883.8983.1280.47
80.379.3176.7374.2572.0168.8763.7359.64
Making histograms from interval and ratio data
Ice Dancing , compulsory dance scores,
Winter OlympicsLet’s use a class interval width of 5 points, with a lowest score of 55.
110-115105-110100-10595-10090-9585-9080-8575-8070-7565-7060-6555-60
Class Intervals Frequency (f)
122312522111
n=23
Count the number of scores in each bin to get the frequency
111.15108.55
106.6103.33100.06
97.3896.6796.1292.7589.6285.3684.5883.8983.1280.47
80.379.3176.7374.2572.0168.8763.7359.64
110-115105-110100-10595-10090-9585-9080-8575-8070-7565-7060-6555-60
Class Intervals Frequency (f)
122312522111
Histogram of Ice Dancing Scores (frequency)
55 60 65 70 75 80 85 90 95 100 105 110 1150
1
2
3
4
5
Ice Dancing Score
Freq
uenc
y
Relative frequency
110-115105-110100-10595-10090-9585-9080-8575-8070-7565-7060-6555-60
Class Intervals Frequency (f)
122312522111
n=23
Divide by the total number of scores to get relative frequency in proportion
Relative frequency
(%)
4.358.708.70
13.044.358.70
21.748.708.704.354.354.35
.0435
.0870
.0870
.1304
.0435
.0870
.2174
.0870
.0870
.0435
.0435
.0435
Relative frequency
(prop)
Then multiply by 100 to get relative frequency in percent
111.15108.55
106.6103.33100.06
97.3896.6796.1292.7589.6285.3684.5883.8983.1280.47
80.379.3176.7374.2572.0168.8763.7359.64
Relative frequency histogram of Ice Dancing Scores (frequency)
110-115105-110100-10595-10090-9585-9080-8575-8070-7565-7060-6555-60
Class IntervalsRelative
frequency (%)4.358.708.70
13.044.358.70
21.748.708.704.354.354.35
55 60 65 70 75 80 85 90 95 100 105 110 1150
5
10
15
20
25
Ice Dancing Score
Rel
ativ
e Fr
eque
ncy
(%)
60 70 80 90 100 110 12001234567
Ice Dancing Score
Freq
uenc
y
60 80 1000
1
2
3
4
5
Ice Dancing Score
Freq
uenc
y
60 80 1000
1
2
3
Ice Dancing Score
Freq
uenc
y
60 80 1000
1
2
Ice Dancing Score
Freq
uenc
y
interval width 10 interval width 5
interval width 3 interval width 1
Choosing your class intervals can have an influence on the way your histogram looks
These three graphs have the same class intervals on the same scores!
60 70 80 90 100 1100
1
2
3
4
5
Ice Dancing Score
Freq
uenc
y
60 80 1000
1
2
3
4
5
Ice Dancing ScoreFr
eque
ncy
60 70 80 90 100 110012345
Ice Dancing Score
Freq
uenc
y
When possible, include zero on your y-axis.
Not like this
As of March 27 March 31 Goal6
6.2
6.4
6.6
6.8
7
Enr
ollm
ent (
Mill
ions
)
As of March 27 March 31 Goal0
2
4
6
8
Enr
ollm
ent (
Mill
ions
)
When possible, include zero on your y-axis:When possible, include zero on your y-axis.
Not like this
Like this
“Fox News Apologizes For Obamacare Graphic, Corrects Its 'Mistake‘”
Percentile ranks and percentile point:
Percentile Point: A point on the measurement scale below which a specific percentage of scores fall.
Percentile Rank: The percentage of cases that fall below a given point on the measurement scale.
Percentile ranks are always between zero and 100.
Growth charts convert percentile points to percentile ranks
At 30 mos. P95 = 36lbs
Percentile ranks and percentile point:
Ice Dancing , compulsory dance scores, Winter Olympics
Class interval f rel f(%)
Cumu-lative f
Cumu-lative %
110-115 1 4.35 23 100105-110 2 8.7 22 95.65100-105 2 8.7 20 86.9695-100 3 13.04 18 78.2690-95 1 4.35 15 65.2285-90 2 8.7 14 60.8780-85 5 21.74 12 52.1775-80 2 8.7 7 30.4370-75 2 8.7 5 21.7465-70 1 4.35 3 13.0460-65 1 4.35 2 8.755-60 1 4.35 1 4.35
What is the percentile rank for a percentile point of 100?In other words,What proportion of scores fall below a score of 100?
78.26% of the scores fall below 100
The number 78.26 is the percentile rankThe number 100 is the corresponding percentile point We write P78.26 =100
Ice Dancing , compulsory dance scores, Winter Olympics
21.74% of the scores are below 75or
P21.74 = 75 or
100-21.74=78.26% of the scores are above 75.
Class interval f rel f(%)
Cumu-lative f
Cumu-lative %
110-115 1 4.35 23 100105-110 2 8.7 22 95.65100-105 2 8.7 20 86.9695-100 3 13.04 18 78.2690-95 1 4.35 15 65.2285-90 2 8.7 14 60.8780-85 5 21.74 12 52.1775-80 2 8.7 7 30.4370-75 2 8.7 5 21.7465-70 1 4.35 3 13.0460-65 1 4.35 2 8.755-60 1 4.35 1 4.35
Percentile ranks and percentile point:
60 65 70 75 80 85 90 95 100 105 110 1150
10
20
30
40
50
60
70
80
90
100
Ice Dancing Score
Cum
ulat
ive
Per
cent
age
Class interval
Cumu-lative %
110-115 100105-110 95.65100-105 86.9695-100 78.2690-95 65.2285-90 60.8780-85 52.1775-80 30.4370-75 21.7465-70 13.0460-65 8.755-60 4.35
The Cumulative Percentage Curve
21.74% of the scores fall below a score of 75
The number 21.74 is the percentile rankThe number 75 is the corresponding percentile point We write P21.74 = 75
0
10
20
30
40
50
60
70
80
90
100
Ice Dancing Score
Cum
ulat
ive
Per
cent
age
Class interval
Cumu-lative %
110-115 100105-110 95.65100-105 86.9695-100 78.2690-95 65.2285-90 60.8780-85 52.1775-80 30.4370-75 21.7465-70 13.0460-65 8.755-60 4.35
The Cumulative Percentage Curve
78.26% of the scores fall below a score of 100
The number 78.26is the percentile rankThe number 100 is the corresponding percentile point We write P78.26 = 100
60 65 70 75 80 85 90 95 100 105 110 115
0
10
20
30
40
50
60
70
80
90
100
Ice Dancing Score
Cum
ulat
ive
Per
cent
age
Class interval
Cumu-lative %
110-115 100105-110 95.65100-105 86.9695-100 78.2690-95 65.2285-90 60.8780-85 52.1775-80 30.4370-75 21.7465-70 13.0460-65 8.755-60 4.35
The Cumulative Percentage Curve
50% of the scores fall below a score of about 84
The number 50 is the percentile rankThe number 84 is an estimate of the percentile point We write P50 = 84
60 65 70 75 80 85 90 95 100 105 110 115
Cumulative frequency distribution
Ice Dancing , compulsory dance scores, Winter Olympics
What is the percentile point for a percentile rank of 21.74%?
Answer: 75 points (21.75% of the scores fall below 75)
Class interval f rel f(%)
Cumu-lative f
Cumu-lative %
110-115 1 4.35 23 100105-110 2 8.7 22 95.65100-105 2 8.7 20 86.9695-100 3 13.04 18 78.2690-95 1 4.35 15 65.2285-90 2 8.7 14 60.8780-85 5 21.74 12 52.1775-80 2 8.7 7 30.4370-75 2 8.7 5 21.7465-70 1 4.35 3 13.0460-65 1 4.35 2 8.755-60 1 4.35 1 4.35
Cumulative frequency distribution
110-115105-110100-10595-10090-9585-9080-8575-8070-7565-7060-6555-60
Class Intervals Frequency (f)
122312522111
Ice Dancing , compulsory dance scores, Winter Olympics
Cumulativeproportion
2322201815141275321
1.00.96.87.78.65.61.52.30.22.13.09.04
Cumulativefrequency
Cumulativepercent
100968778656152302213
84
What is the percentile point for a percentile rank of 50? (Or what is P50?)
We know it’s between 80 and 85, since 52% fall below 85 and 30% fall below 80.
Here’s how to calculate the percentile rank for each raw score:note this is different from the book!
Ice Dancing, compulsory dance scores, Winter Olympics
Rank order Subtract 1/2 Divide by n (23) Multiply by 100
The percentile point for a percentile rank of 50 is 84.58
( P50 = 84.58)
111.15 23 22.5 0.98 98108.55 22 21.5 0.93 93
106.6 21 20.5 0.89 89103.33 20 19.5 0.85 85100.06 19 18.5 0.80 80
97.38 18 17.5 0.76 7696.67 17 16.5 0.72 7296.12 16 15.5 0.67 6792.75 15 14.5 0.63 6389.62 14 13.5 0.59 5985.36 13 12.5 0.54 5484.58 12 11.5 0.50 5083.89 11 10.5 0.46 4683.12 10 9.5 0.41 4180.47 9 8.5 0.37 37
80.3 8 7.5 0.33 3379.31 7 6.5 0.28 2876.73 6 5.5 0.24 2474.25 5 4.5 0.20 2072.01 4 3.5 0.15 1568.87 3 2.5 0.11 1163.73 2 1.5 0.07 759.64 1 0.5 0.02 2
Score
Here’s how to calculate the percentile rank for each raw score:
Ice Dancing , compulsory dance scores, Winter Olympics
Rank order Subtract 1/2 Divide by 23 Multiply by 100
The percentile point for a percentile rank of 80 is 100.6
(P80 = 100.6)
111.15 23 22.5 0.98 98108.55 22 21.5 0.93 93
106.6 21 20.5 0.89 89103.33 20 19.5 0.85 85100.06 19 18.5 0.80 80
97.38 18 17.5 0.76 7696.67 17 16.5 0.72 7296.12 16 15.5 0.67 6792.75 15 14.5 0.63 6389.62 14 13.5 0.59 5985.36 13 12.5 0.54 5484.58 12 11.5 0.50 5083.89 11 10.5 0.46 4683.12 10 9.5 0.41 4180.47 9 8.5 0.37 37
80.3 8 7.5 0.33 3379.31 7 6.5 0.28 2876.73 6 5.5 0.24 2474.25 5 4.5 0.20 2072.01 4 3.5 0.15 1568.87 3 2.5 0.11 1163.73 2 1.5 0.07 759.64 1 0.5 0.02 2
Score
How do we calculate the percentile point for all the other ranks?
Rank order Subtract 1/2 Divide by 23 Multiply by 100
Example: What is the percentile point for the percentile rank of 90%?
We know it’s between 106.6 and 108.55
In fact, it’s ¼ of the way between 106.6 and 108.55 (90-89)/(93-89) = 1/4
That means that P90 = 106.6 + 1/4(108.55-106.6) = 107.09
111.15 23 22.5 0.98 98108.55 22 21.5 0.93 93
106.6 21 20.5 0.89 89103.33 20 19.5 0.85 85100.06 19 18.5 0.80 80
97.38 18 17.5 0.76 7696.67 17 16.5 0.72 72
Score
How do we calculate the percentile point for other ranks?
Rank order Subtract 1/2 Divide by 23 Multiply by 100
Example, what is the percentile point for the percentile rank of P75?
We know it’s ¾ of the way between 96.67 and 97.38
96.67 + 3/4(97.38-96.67) = 97.2
111.15 23 22.5 0.98 98108.55 22 21.5 0.93 93
106.6 21 20.5 0.89 89103.33 20 19.5 0.85 85100.06 19 18.5 0.80 80
97.38 18 17.5 0.76 7696.67 17 16.5 0.72 72
Score
How do we calculate the percentile point for other ranks?
Rank order Subtract 1/2 Divide by 23 Multiply by 100
Example, what is the percentile score for the percentile rank of P25?
We know it’s 1/4 of the way between 76.73 and 79.31
76.73 + 1/4(79.31-76.73) = 77.37
80.47 9 8.5 0.37 3780.3 8 7.5 0.33 33
79.31 7 6.5 0.28 2876.73 6 5.5 0.24 2474.25 5 4.5 0.20 2072.01 4 3.5 0.15 1568.87 3 2.5 0.11 1163.73 2 1.5 0.07 759.64 1 0.5 0.02 2
Score
Rank order Subtract 1/2 Divide by 23 Multiply by 100
Example, what is the percentile point for the percentile rank of 81?
111.15 23 22.5 0.98 98108.55 22 21.5 0.93 93
106.6 21 20.5 0.89 89103.33 20 19.5 0.85 85100.06 19 18.5 0.80 80
97.38 18 17.5 0.76 7696.67 17 16.5 0.72 72
Score
General formula for calculating percentile points:
1) Make a chart like the one above2) Find the two rows that fall above and below the percentile rank3) Let PH and PL be the high and low cumulative percentiles (85 and 80 in this example)4) Let SH and SL be the high and low scores (103.33 and 100.06 in this example)5) If p is the percentile rank (81 in our example), then the percentile point is:
PLPHPLpSLSHSL )( 71.100
80858081)06.10033.103(06.100
Going the other way: from percentile ranks to percentile points
Example: What is the percentile rank for the percentile point of 103.33?
Rank order Subtract 1/2 Divide by 23 Multiply by 100111.15 23 22.5 0.98 98108.55 22 21.5 0.93 93
106.6 21 20.5 0.89 89103.33 20 19.5 0.85 85100.06 19 18.5 0.80 80
97.38 18 17.5 0.76 7696.67 17 16.5 0.72 72
Score
This is easy, since 103.33 is one of the scores. The percentile rank is 85%.
85% of the scores fall below 103.33
Going the other way: from percentile ranks to percentile points
Example: What is the percentile rank for the percentile point of 100?
Rank order Subtract 1/2 Divide by 23 Multiply by 100111.15 23 22.5 0.98 98108.55 22 21.5 0.93 93
106.6 21 20.5 0.89 89103.33 20 19.5 0.85 85100.06 19 18.5 0.80 80
97.38 18 17.5 0.76 7696.67 17 16.5 0.72 72
Score
This is not as easy, since 100 is not one of the scores. We do know that it is between 76 and 80. In fact, we know it must be really close to 80, since P80 is 100.06
Here’s how to do it. After finding the two rows that bracket the percentile point, if S is the percentile point, then the percentile rank is:
SLSHSLSPLPHPL )( 91.79
38.9706.10038.97100)7680(76
79.91% o the scores fall below 100
Class interval f58-61 161-64 064-67 067-70 170-73 273-76 076-79 179-82 582-85 485-88 288-91 191-94 394-97 0
97-100 0
9493929187868584848382818180807773736859
Raw Test Scores
Another Example: integer valued dataScores on Professor Flans’ Midterm (n = 20)
We’ll choose a class interval width of 3. An odd number for width is good for integer data because the middle value will be a whole number.
Remember, scores that land on the border are assigned to the lower class interval.
So 85 lands in the interval82-85.
60 63 66 69 72 75 78 81 84 87 90 93 96 990
1
2
3
4
5
Test Score
Freq
uenc
y
Class interval f58-61 161-64 064-67 067-70 170-73 273-76 076-79 179-82 582-85 485-88 288-91 191-94 394-97 0
97-100 0
Bins labeled by the centers of the class intervals
58-61 61-64 64-67 67-70 70-73 73-76 76-79 79-82 82-85 85-88 88-91 91-94 94-97 97-1000
1
2
3
4
5
Test Score
Freq
uenc
y
You can also show the whole interval on the x-axis labels
The Cumulative Percentage Curve
Class Interval frequency97-100 094-97 091-94 388-91 185-88 282-85 479-82 576-79 173-76 070-73 267-70 164-67 061-64 058-61 1
Relative frequency(%)
00
155
10202550
105005
Cumulativefrequency
202020171614105442111
cumulative frequency %
1001001008580705025202010555
Class Interval
Cumulative frequency%
97-100 10094-97 10091-94 10088-91 8585-88 8082-85 7079-82 5078-79 2573-76 2070-73 2067-70 1064-67 561-64 558-61 5
The Cumulative Percentage Curve for Professor Flans’ Midterm
61 64 67 70 73 76 79 82 85 88 91 94 97 1000
10
20
30
40
50
60
70
80
90
100
Test Score
Cum
ulat
ive
Freq
uenc
y (%
)
About 50% of the scores fall below 82. (So P50 is about 82)
Estimate the percentile point for a percentile rank of 50%
Estimating percentile points and percentile ranks from the cumulative percentage curve
0
10
20
30
40
50
60
70
80
90
100
Test Score
Cum
ulat
ive
Freq
uenc
y (%
)
Estimate the percentile point for a percentile rank of 90%
90% of the scores fall below a score of about 92. (P90 is about 92)
61 64 67 70 73 76 79 82 85 88 91 94 97 100
94 20 19.5 0.975 97.593 19 18.5 0.925 92.592 18 17.5 0.875 87.591 17 16.5 0.825 82.587 16 15.5 0.775 77.586 15 14.5 0.725 72.585 14 13.5 0.675 67.584 13 12.5 0.625 62.584 12 11.5 0.575 57.583 11 10.5 0.525 52.582 10 9.5 0.475 47.581 9 8.5 0.425 42.581 8 7.5 0.375 37.580 7 6.5 0.325 32.580 6 5.5 0.275 27.577 5 4.5 0.225 22.573 4 3.5 0.175 17.573 3 2.5 0.125 12.568 2 1.5 0.075 7.559 1 0.5 0.025 2.5
Calculating percentile points from raw data.
What is the percentile point for a percentile rank of 50%?
Rank order Subtract 1/2 Divide by 20 Multiply by 100Test score
It’s between 82 and 83
pLpHPLpSLSHSL )(
5.825.475.525.4750)8283(82
P50 = 82.5
94 20 19.5 0.975 97.593 19 18.5 0.925 92.592 18 17.5 0.875 87.591 17 16.5 0.825 82.587 16 15.5 0.775 77.586 15 14.5 0.725 72.585 14 13.5 0.675 67.584 13 12.5 0.625 62.584 12 11.5 0.575 57.583 11 10.5 0.525 52.582 10 9.5 0.475 47.581 9 8.5 0.425 42.581 8 7.5 0.375 37.580 7 6.5 0.325 32.580 6 5.5 0.275 27.577 5 4.5 0.225 22.573 4 3.5 0.175 17.573 3 2.5 0.125 12.568 2 1.5 0.075 7.559 1 0.5 0.025 2.5
What is the percentile point for a percentile rank of 90%?
Rank order Subtract 1/2 Divide by 20 Multiply by 100Test score
It’s between 92 and 93
pLpHPLpSLSHSL )(
5.925.875.925.8790)9293(92
It’s exactly halfway between 92 and 93
Calculating percentile points from raw data.
Going the other way: from percentile ranks to percentile points
94 20 19.5 0.975 97.593 19 18.5 0.925 92.592 18 17.5 0.875 87.591 17 16.5 0.825 82.587 16 15.5 0.775 77.586 15 14.5 0.725 72.585 14 13.5 0.675 67.584 13 12.5 0.625 62.584 12 11.5 0.575 57.583 11 10.5 0.525 52.582 10 9.5 0.475 47.581 9 8.5 0.425 42.581 8 7.5 0.375 37.580 7 6.5 0.325 32.580 6 5.5 0.275 27.577 5 4.5 0.225 22.573 4 3.5 0.175 17.573 3 2.5 0.125 12.568 2 1.5 0.075 7.559 1 0.5 0.025 2.5
Rank order Subtract 1/2 Divide by 23 Multiply by 100Test score
Example, what is the percentile rank for the percentile point of 90?
SLSHSLSpLpHpL )(
25.8187918790)5.775.82(5.77
It’s between 77.5 and 82.5
81.25% of the scores fall below 90 points
60 63 66 69 72 75 78 81 84 87 90 93 96 990
1
2
3
4
5
Test Score
Freq
uenc
y
60 63 66 69 72 75 78 81 84 87 90 93 96 990
1
2
3
4
5
Test Score
Freq
uenc
y
Frequency histogram Frequency polygon
More stuff about frequency distributions:
Properties of frequency distributions
‘normal’ or bell-shaped
positively skewedNegatively skewed
Example of a negatively skewed distribution
300 350 400 450 500 550 600 650 700 750 8000
5
10
15
20
25
30
35
40
GRE quant scores
Freq
uenc
y
Example of positively skewed distribution: Household annual income
•P0-89 (bottom 90%) — income below $104,696 (average income, $30,374*) •P90-100 (top 10%) — income above $104,696 (average income, $269,658*) •P90-95 (next 5%) — income between $104,696 and $148,423 (average income, $122,429*) •P95-99 (next 4%) — income between $148,423 and $382,593 (average income, $210,597*) •P99-100 (top 1%) — income above $382,593 (average income, $1,243,516*) •P99.5-100 (top 0.5%) — income above $597,584 (average income, $2,022,315*) •P99.9-100 (top 0.1%) — income above $1,898,200 (average income, $6,289,800*) •P99.99-100 (top .01%) —income above $10,659,283 (average income, $29,638,027*)
Household income distribution as of 2006:
http://www.wealthandwant.com/issues/income/income_distribution.html
So the ‘top 1%’ can be described as:
P99 = $382,593
Shift in central tendency
0 20 40 60 80 100Scores
Shift in variability
0 20 40 60 80 100Scores
Two (of many) ways that frequency distributions differ
Top Related