CHAPTER 2 : DESCRIPTIVE STATISTICS
1. Introduction
2. Organizing and Graphing Qualitative Data
3. Organizing and Graphing Quantitative Data
4. Central Tendency Measurement
5. Dispersion Measurement
6. Mean, Variance and Standard Deviation for Grouped Data
7. Measure of Skewness
OBJECTIVES
After completing this chapter, students should be able to:
Create and interpret graphical displays involve qualitative and quantitative data.
Describe the difference between grouped and ungrouped frequency distribution, frequency and relative frequency, relative frequency and cumulative relative frequency.
Identify and describe the parts of a frequency distribution: class boundaries, class width, and class midpoint.
Identify the shapes of distributions.
Compute, describe, compare and interpret the three measures of central tendency: mean, median, and mode for ungrouped and grouped data.
Compute, describe, compare and interpret the two measures of dispersion: range, and standard deviation (variance) for ungrouped and grouped data.
Compute, describe, and interpret the two measures of position: quartiles and interquartile range for ungrouped and grouped data.
Compute, describe and interpret the measures of skewness: Pearson Coefficient of Skewness.
1
INTRODUCTION
Raw data - Data recorded in the sequence in which there are collected and before they are processed or ranked.
Array data - Raw data that is arranged in ascending or descending order.
Example 1Here is a list of question asked in a large statistics class and the “raw data” given by one of the students:
1. What is your sex (m=male, f=female)?Answer (raw data): m
2. How many hours did you sleep last night?Answer: 5 hours
3. Randomly pick a letter – S or Q.Answer: S
4. What is your height in inches?Answer: 67 inches
5. What’s the fastest you’ve ever driven a car (mph)?Answer: 110 mph
Example 2Quantitative raw data
These data also called ungrouped data
Qualitative raw data
2
ORGANIZING AND GRAPHING QUALITATIVE DATA
1. Frequency Distributions/ Table2. Relative Frequency and Percentage Distribution 3. Graphical Presentation of Qualitative Data
Frequency Distributions / Table
A frequency distribution for qualitative data lists all categories and the number of elements that belong to each of the categories.
It exhibits the frequencies are distributed over various categories
Also called a frequency distribution table or simply a frequency table.
The number of students who belong to a certain category is called the frequency of that category.
Relative Frequency and Percentage Distribution
A relative frequency distribution is a listing of all categories along with their relative frequencies (given as proportions or percentages).
It is commonplace to give the frequency and relative frequency distribution together.
Calculating relative frequency and percentage of a category
Relative Frequency of a category= Frequency of that category
Sum of all frequencies
Percentage = (Relative Frequency)* 100
3
Example 3
A sample of UUM staff-owned vehicles produced by Proton was identified and the make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj = Waja, St = Satria, P = Perdana, Sv = Savvy):
W W P Is Is P Is W St WjIs W W Wj Is W W Is W WjWj Is Wj Sv W W W Wj St WWj Sv W Is P Sv Wj Wj W WSt W W W W St St P Wj Sv
Construct a frequency distribution table for these data with their relative frequency and percentage.
Solution:
Category Frequency Relative Frequency Percentage (%)
Wira 19 19/50 = 0.38 0.38*100= 38
Iswara 8 0.16 16Perdana 4 0.08 8Waja 10 0.20 20Satria 5 0.10 10Savvy 4 0.08 8Total 50 1.00 100
Graphical Presentation of Qualitative Data
4
1. Bar Graphs
A graph made of bars whose heights represent the frequencies of respective categories.
Such a graph is most helpful when you have many categories to represent.
Notice that a gap is inserted between each of the bars.
It has=> simple/ vertical bar chart=> horizontal bar chart => component bar chart => multiple bar chart
Simple/ Vertical Bar Chart
To construct a vertical bar chart, mark the various categories on the horizontal axis and mark the frequencies on the vertical axis
Horizontal Bar Chart
To construct a horizontal bar chart, mark the various categories on the vertical axis and mark the frequencies on the horizontal axis.
5
Example 4: Refer Example 3,
Wira
Iswara
Perdana
Waja
Satria
Savvy
0 2 4 6 8 10 12 14 16 18 20
UUM Staff-owned Vehicles Produced By Proton
Frequency
Type
s of
Veh
icle
Another example of horizontal bar chart
Figure 2.4: Number of students at Diversity College who are immigrants, by last country of permanent residence
Component Bar Chart
To construct a component bar chart, all categories is in one bar and every bar is divided into components.
The height of components should be tally with representative frequencies.
Example 5
Suppose we want to illustrate the information below, representing the number of people participating in the activities offered by an outdoor pursuits centre during Jun of three consecutive years.
2004 2005 2006Climbing 21 34 36Caving 10 12 21Walking 75 85 100Sailing 36 36 40
6
Total 142 167 191
2004 2005 20060
20406080
100120140160180200
Activities Breakdown (Jun)
Sailing
Walking
Caving
Climbing
Year
Num
ber o
f par
ticip
ants
Figure 2.5
Multiple Bar Chart
To construct a multiple bar chart, each bars that representative any categories are gathered in groups.
The height of the bar represented the frequencies of categories. Useful for making comparisons (two or more values). Example 6: Refer example 5,
2004 2005 20060
20
40
60
80
100
120
Activities Breakdown (Jun)
Climbing
Caving
Walking
Sailing
Year
Num
ber o
f par
ticip
ants
Another example of horizontal bar chart:
7
Preferred snack choices of students at UUM
The bar graphs for relative frequency and percentage distributions can be drawn simply by marking the relative frequencies or percentages, instead of the class frequencies.
2. Pie Chart
A circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories.
An alternative to the bar chart and useful for summarizing a single categorical variable if there are not too many categories.
The chart makes it easy to compare relative sizes of each class/category.
The whole pie represents the total sample or population. The pie is divided into different portions that represent the different categories.
To construct a pie chart, we multiply 360o by the relative frequency for each category to obtain the degree measure or size of the angle for the corresponding categories.
Example 7
8
Figure 2.8Example 8
Table 2.7Movie Genres Frequency Relative Frequency Angle Size
ComedyActionRomanceDramaHorrorForeignScience Fiction
54362828221616
0.270.180.140.140.110.080.08
360*0.27=97.2 360*0.18=64.8
360*0.14=50.4360*0.14=50.4360*0.11=39.6360*0.08=28.8360*0.08=28.8
200 1.00 360
Figure 2.9
3. Line Graph/Time Series Graph
9
Line graphs are more popular than all other graphs combined because their visual characteristics reveal data trends clearly and these graphs are easy to create.
When analyzing the graph, look for a trend or pattern that occurs over the time period.
Example is the line ascending (indicating an increase over time) or descending (indicating a decrease over time).
Another thing to look for is the slope, or steepness, of the line. A line that is steep over a specific time period indicates a rapid increase or decrease over that period.
Two data sets can be compared on the same graph (called a compound time series graph) if two lines are used.
Data collected on the same element for the same variable at different points in time or for different periods of time are called time series data.
A line graph is a visual comparison of how two variables—shown on the x- and y-axes—are related or vary with each other. It shows related information by drawing a continuous line between all the points on a grid.
Line graphs compare two variables: one is plotted along the x-axis (horizontal) and the other along the y-axis (vertical). The y-axis in a line graph usually indicates quantity or percentage, The horizontal x-axis often measures units of time. As a result, the line graph is often viewed as a time series graph
Example 9A transit manager wishes to use the following data for a presentation showing how Port Authority Transit ridership has changed over the years. Draw a time series graph for the data and summarize the findings.
Year Ridership(in millions)
19901991199219931994
88.085.075.776.675.4
1990 1991 1992 1993 19947577798183858789
Year
Rid
ersh
ip (i
n m
illio
ns)
The graph shows a decline in ridership through 1992 and then leveling off for the years 1993 and 1994.
Exercise 1
10
1. The following data show the method of payment by 16 customers in a supermarket checkout line. Here, C = cash, CK = check, CC = credit card, D = debit and O = other.
C CK CK C CC D O CCK CC D CC C CK CK CC
a. Construct a frequency distribution table.b. Calculate the relative frequencies and percentages for all categories.c. Draw a pie chart for the percentage distribution.
2. The frequency distribution table represents the sale of certain product in ZeeZee Company. Each of the products was given the frequency of the sales in certain period. Find the relative frequency and the percentage of each product. Then, construct a pie chart using the obtained information.
Type of Product
Frequency Relative Frequency
Percentage Angle Size
ABCDE
13125911
3. Draw a time series graph to represent the data for the number of worldwide airline fatalities for the given years.
Year 1990 1991 1992 1993 1994 1995 1996No. of fatalities 440 510 990 801 732 557 1132
4. A questionnaire about how people get news resulted in the following information from 25 respondents (N = newspaper, T = television, R = radio, M = magazine).
N N R T TR N T M RM M N R NT R M N MT R R N N
a. Construct a frequency distribution for the data.b. Construct a bar graph for the data.
5. The given information shows the export and import trade in million RM for four months of sales in certain year. Using the provided information, present this data in component bar graph.
11
Month Export ImportSeptember
OctoberNovemberDecember
28303224
20281714
6. The following information represents the maximum rain fall in millimeter (mm) in each state in Malaysia. You are supposed to help a meteorologist in your place to make an analysis. Based on your knowledge, present this information using the most appropriate chart and give your comment.
State Quantity (mm)
PerlisKedahPulau PinangPerakSelangorWilayah Persekutuan Kuala LumpurNegeri SembilanMelakaJohorPahangTerengganuKelantanSarawakSabah
435512163721664
100339022387610501255986878456
12
2.3 ORGANIZING AND GRAPHING QUANTITATIVE DATA
2.3.1 Stem-and-Leaf Display
In stem and leaf display of quantitative data, each value is divided into two portions – a stem and a leaf. Then the leaves for each stem are shown separately in a display.
Gives the information of data pattern. Can detect which value frequently repeated.
Example 10
25 12 9 10 5 12 23 736 13 11 12 31 28 37 614 41 38 44 13 22 18 19
Solution:
0 9 5 7 61 2 0 2 3 1 2 4 3 8 92 5 3 8 23 6 1 7 84 1 4
2.3.1 Frequency Distributions
A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class.
Data presented in form of frequency distribution are called grouped data.
13
The class boundary is given by the midpoint of the upper limit of one class and the lower limit of the next class. Also called real class limit.
To find the midpoint of the upper limit of the first class and the lower limit of the second class, we divide the sum of these two limits by 2.
e.g.:
Class Width (class size)
Class width = Upper boundary – Lower boundary
e.g. : Width of the first class = 600.5 – 400.5 = 200
Class Midpoint or Mark
e.g:
class boundary
14
Constructing Frequency Distribution Tables
1. To decide the number of classes, we used Sturge’s formula, which is
c = 1 + 3.3 log n
where c is the no. of classes n is the no. of observations in the data set.
2. Class width,
This class width is rounded to a convenient number.
3. Lower Limit of the First Class or the Starting Point
Use the smallest value in the data set.
Example 11The following data give the total home runs hit by all players of each of the 30 Major League Baseball teams during 2004 season
15
i) Number of classes, c = 1 + 3.3 log 30 = 1 + 3.3(1.48)
= 5.89 6 class
ii) Class width,
iii) Starting Point = 135Table 2.10 Frequency Distribution for Data of Table 2.9
Total Home Runs Tally f135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242
|||| |||||||||| |||| ||||||||
1025634
2.3.2 Relative Frequency and Percentage Distributions
Example 12 (Refer example 11)Table 2.11: Relative Frequency and Percentage Distributions
Total Home Runs Class Boundaries Relative Frequency
%
135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242
134.5 less than 152.5152.5 less than 170.5170.5 less than 188.5188.5 less than 206.5206.5 less than 224.5224.5 less than 242.5
0.33330.06670.1667
0.20.1
0.1333
33.336.6716.67
2010
13.33Sum 1.0 100%
16
2.3.3 Graphing Grouped Data
1. Histograms
A histogram is a graph in which the class boundaries are marked on the horizontal axis and either the frequencies, relative frequencies, or percentages are marked on the vertical axis. The frequencies, relative frequencies or percentages are represented by the heights of the bars.
In histogram, the bars are drawn adjacent to each other and there is a space between y axis and the first bar.
Example 13 (Refer example 11)
10
2
4
6
8
10
12
Total Home Runs
Freq
uenc
y
Frequency histogram for Table 2.10
2. Polygon
A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon.
Example 13
10
2
4
6
8
10
12
Total home runs
Freq
uenc
y
Frequency polygon for Table 2.10
134.5 152.5 170.5 188.5 206.5 224.5 242.5
134.5 152.5 170.5 188.5 206.5 224.5 242.5
17
For a very large data set, as the number of classes is increased (and the width of classes is decreased), the frequency polygon eventually becomes a smooth curve called a frequency distribution curve or simply a frequency curve.
Frequency distribution curve
2.3.5 Shape of Histogram Same as polygon.
For a very large data set, as the number of classes is increased (and the width of classes is decreased), the frequency polygon eventually becomes a smooth curve called a frequency distribution curve or simply a frequency curve.
The most common of shapes are:(i) Symmetric(ii) Right skewed(iii) Left skewed
Symmetric histograms
Right skewed and Left skewed
18
566337
30 – 3940 – 4950 – 5960 - 6970 – 7980 - 89
30
Number of students (f)
Total
Earnings (RM)
20
2530
35
Describing data using graphs helps us insight into the main characteristics of the data.
When interpreting a graph, we should be very cautious. We should observe carefully whether the frequency axis has been truncated or whether any axis has been unnecessarily shortened or stretched.
2.3.6 Cumulative Frequency Distributions
A cumulative frequency distribution gives the total number of values that fall below the upper boundary of each class.
Example 14: Using the frequency distribution of table 2.11,
Total Home Runs Class Boundaries Cumulative Frequency135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242
134.5 less than 152.5152.5 less than 170.5170.5 less than 188.5188.5 less than 206.5206.5 less than 224.5224.5 less than 242.5
1010+2=1210+2+5=1710+2+5+6=2310+2+5+6+3=2610+2+5+6+3+4=30
Ogive
An ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes.
Two type of ogive:(i) ogive less than(ii) ogive greater than
First, build a table of cumulative frequency.
Example 15 (Ogive Less Than)
051117202330
Less than 29.5Less than 39.5Less than 495Less than 59.5Less than 69.5Less than 79.5Less than 89.5
Cumulative frequency (F)Earnings (RM)
Freq
uenc
y
19
566337
30 – 3940 – 4950 – 5960 - 6970 – 7980 - 89
30
Number of students (f)
Total
Earnings (RM)
0
5
10
15
20
25
30
35
29.5 39.5 49.5 59.5 69.5 79.5 89.5
Earnings
Cumulative Frequency
Example 16 (Ogive Greater Than)
Freq
uenc
y
302519131070
More than 29.5More than 39.5More than 49.5More than 59.5More than 69.5More than 79.5More than 89.5
Cumulative Frequency (F)
Earnings (RM)
20
Smallest value Largest value K1 Median K3
Largest value K1 Median K3
Largest value K1 Median K3
Smallest value
Smallest value
For symmetry data
For left skewed data
For right skewed data
2.3.7 Box-Plot
Describe the analyze data graphically using 5 measurement: smallest value, first quartile (K1), second quartile (median or K2), third quartile (K3) and largest value.
21
Top Related