Download - CHAPTER 2 · Web viewCHAPTER 2 : DESCRIPTIVE STATISTICS Introduction Organizing and Graphing Qualitative Data Organizing and Graphing Quantitative Data Central Tendency Measurement

CHAPTER 2 : DESCRIPTIVE STATISTICS

1. Introduction

2. Organizing and Graphing Qualitative Data

3. Organizing and Graphing Quantitative Data

4. Central Tendency Measurement

5. Dispersion Measurement

6. Mean, Variance and Standard Deviation for Grouped Data

7. Measure of Skewness

OBJECTIVES

After completing this chapter, students should be able to:

Create and interpret graphical displays involve qualitative and quantitative data.

Describe the difference between grouped and ungrouped frequency distribution, frequency and relative frequency, relative frequency and cumulative relative frequency.

Identify and describe the parts of a frequency distribution: class boundaries, class width, and class midpoint.

Identify the shapes of distributions.

Compute, describe, compare and interpret the three measures of central tendency: mean, median, and mode for ungrouped and grouped data.

Compute, describe, compare and interpret the two measures of dispersion: range, and standard deviation (variance) for ungrouped and grouped data.

Compute, describe, and interpret the two measures of position: quartiles and interquartile range for ungrouped and grouped data.

Compute, describe and interpret the measures of skewness: Pearson Coefficient of Skewness.

1

INTRODUCTION

Raw data - Data recorded in the sequence in which there are collected and before they are processed or ranked.

Array data - Raw data that is arranged in ascending or descending order.

Example 1Here is a list of question asked in a large statistics class and the “raw data” given by one of the students:

1. What is your sex (m=male, f=female)?Answer (raw data): m

2. How many hours did you sleep last night?Answer: 5 hours

3. Randomly pick a letter – S or Q.Answer: S

4. What is your height in inches?Answer: 67 inches

5. What’s the fastest you’ve ever driven a car (mph)?Answer: 110 mph

Example 2Quantitative raw data

These data also called ungrouped data

Qualitative raw data

2

ORGANIZING AND GRAPHING QUALITATIVE DATA

1. Frequency Distributions/ Table2. Relative Frequency and Percentage Distribution 3. Graphical Presentation of Qualitative Data

Frequency Distributions / Table

A frequency distribution for qualitative data lists all categories and the number of elements that belong to each of the categories.

It exhibits the frequencies are distributed over various categories

Also called a frequency distribution table or simply a frequency table.

The number of students who belong to a certain category is called the frequency of that category.

Relative Frequency and Percentage Distribution

A relative frequency distribution is a listing of all categories along with their relative frequencies (given as proportions or percentages).

It is commonplace to give the frequency and relative frequency distribution together.

Calculating relative frequency and percentage of a category

Relative Frequency of a category= Frequency of that category

Sum of all frequencies

Percentage = (Relative Frequency)* 100

3

Example 3

A sample of UUM staff-owned vehicles produced by Proton was identified and the make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj = Waja, St = Satria, P = Perdana, Sv = Savvy):

W W P Is Is P Is W St WjIs W W Wj Is W W Is W WjWj Is Wj Sv W W W Wj St WWj Sv W Is P Sv Wj Wj W WSt W W W W St St P Wj Sv

Construct a frequency distribution table for these data with their relative frequency and percentage.

Solution:

Category Frequency Relative Frequency Percentage (%)

Wira 19 19/50 = 0.38 0.38*100= 38

Iswara 8 0.16 16Perdana 4 0.08 8Waja 10 0.20 20Satria 5 0.10 10Savvy 4 0.08 8Total 50 1.00 100

Graphical Presentation of Qualitative Data

4

1. Bar Graphs

A graph made of bars whose heights represent the frequencies of respective categories.

Such a graph is most helpful when you have many categories to represent.

Notice that a gap is inserted between each of the bars.

It has=> simple/ vertical bar chart=> horizontal bar chart => component bar chart => multiple bar chart

Simple/ Vertical Bar Chart

To construct a vertical bar chart, mark the various categories on the horizontal axis and mark the frequencies on the vertical axis

Horizontal Bar Chart

To construct a horizontal bar chart, mark the various categories on the vertical axis and mark the frequencies on the horizontal axis.

5

Example 4: Refer Example 3,

Wira

Iswara

Perdana

Waja

Satria

Savvy

0 2 4 6 8 10 12 14 16 18 20

UUM Staff-owned Vehicles Produced By Proton

Frequency

Type

s of

Veh

icle

Another example of horizontal bar chart

Figure 2.4: Number of students at Diversity College who are immigrants, by last country of permanent residence

Component Bar Chart

To construct a component bar chart, all categories is in one bar and every bar is divided into components.

The height of components should be tally with representative frequencies.

Example 5

Suppose we want to illustrate the information below, representing the number of people participating in the activities offered by an outdoor pursuits centre during Jun of three consecutive years.

2004 2005 2006Climbing 21 34 36Caving 10 12 21Walking 75 85 100Sailing 36 36 40

6

Total 142 167 191

2004 2005 20060

20406080

100120140160180200

Activities Breakdown (Jun)

Sailing

Walking

Caving

Climbing

Year

Num

ber o

f par

ticip

ants

Figure 2.5

Multiple Bar Chart

To construct a multiple bar chart, each bars that representative any categories are gathered in groups.

The height of the bar represented the frequencies of categories. Useful for making comparisons (two or more values). Example 6: Refer example 5,

2004 2005 20060

20

40

60

80

100

120

Activities Breakdown (Jun)

Climbing

Caving

Walking

Sailing

Year

Num

ber o

f par

ticip

ants

Another example of horizontal bar chart:

7

Preferred snack choices of students at UUM

The bar graphs for relative frequency and percentage distributions can be drawn simply by marking the relative frequencies or percentages, instead of the class frequencies.

2. Pie Chart

A circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories.

An alternative to the bar chart and useful for summarizing a single categorical variable if there are not too many categories.

The chart makes it easy to compare relative sizes of each class/category.

The whole pie represents the total sample or population. The pie is divided into different portions that represent the different categories.

To construct a pie chart, we multiply 360o by the relative frequency for each category to obtain the degree measure or size of the angle for the corresponding categories.

Example 7

8

Figure 2.8Example 8

Table 2.7Movie Genres Frequency Relative Frequency Angle Size

ComedyActionRomanceDramaHorrorForeignScience Fiction

54362828221616

0.270.180.140.140.110.080.08

360*0.27=97.2 360*0.18=64.8

360*0.14=50.4360*0.14=50.4360*0.11=39.6360*0.08=28.8360*0.08=28.8

200 1.00 360

Figure 2.9

3. Line Graph/Time Series Graph

9

Line graphs are more popular than all other graphs combined because their visual characteristics reveal data trends clearly and these graphs are easy to create.

When analyzing the graph, look for a trend or pattern that occurs over the time period.

Example is the line ascending (indicating an increase over time) or descending (indicating a decrease over time).

Another thing to look for is the slope, or steepness, of the line. A line that is steep over a specific time period indicates a rapid increase or decrease over that period.

Two data sets can be compared on the same graph (called a compound time series graph) if two lines are used.

Data collected on the same element for the same variable at different points in time or for different periods of time are called time series data.

A line graph is a visual comparison of how two variables—shown on the x- and y-axes—are related or vary with each other. It shows related information by drawing a continuous line between all the points on a grid.

Line graphs compare two variables: one is plotted along the x-axis (horizontal) and the other along the y-axis (vertical). The y-axis in a line graph usually indicates quantity or percentage, The horizontal x-axis often measures units of time. As a result, the line graph is often viewed as a time series graph

Example 9A transit manager wishes to use the following data for a presentation showing how Port Authority Transit ridership has changed over the years. Draw a time series graph for the data and summarize the findings.

Year Ridership(in millions)

19901991199219931994

88.085.075.776.675.4

1990 1991 1992 1993 19947577798183858789

Year

Rid

ersh

ip (i

n m

illio

ns)

The graph shows a decline in ridership through 1992 and then leveling off for the years 1993 and 1994.

Exercise 1

10

1. The following data show the method of payment by 16 customers in a supermarket checkout line. Here, C = cash, CK = check, CC = credit card, D = debit and O = other.

C CK CK C CC D O CCK CC D CC C CK CK CC

a. Construct a frequency distribution table.b. Calculate the relative frequencies and percentages for all categories.c. Draw a pie chart for the percentage distribution.

2. The frequency distribution table represents the sale of certain product in ZeeZee Company. Each of the products was given the frequency of the sales in certain period. Find the relative frequency and the percentage of each product. Then, construct a pie chart using the obtained information.

Type of Product

Frequency Relative Frequency

Percentage Angle Size

ABCDE

13125911

3. Draw a time series graph to represent the data for the number of worldwide airline fatalities for the given years.

Year 1990 1991 1992 1993 1994 1995 1996No. of fatalities 440 510 990 801 732 557 1132

4. A questionnaire about how people get news resulted in the following information from 25 respondents (N = newspaper, T = television, R = radio, M = magazine).

N N R T TR N T M RM M N R NT R M N MT R R N N

a. Construct a frequency distribution for the data.b. Construct a bar graph for the data.

5. The given information shows the export and import trade in million RM for four months of sales in certain year. Using the provided information, present this data in component bar graph.

11

Month Export ImportSeptember

OctoberNovemberDecember

28303224

20281714

6. The following information represents the maximum rain fall in millimeter (mm) in each state in Malaysia. You are supposed to help a meteorologist in your place to make an analysis. Based on your knowledge, present this information using the most appropriate chart and give your comment.

State Quantity (mm)

PerlisKedahPulau PinangPerakSelangorWilayah Persekutuan Kuala LumpurNegeri SembilanMelakaJohorPahangTerengganuKelantanSarawakSabah

435512163721664

100339022387610501255986878456

12

2.3 ORGANIZING AND GRAPHING QUANTITATIVE DATA

2.3.1 Stem-and-Leaf Display

In stem and leaf display of quantitative data, each value is divided into two portions – a stem and a leaf. Then the leaves for each stem are shown separately in a display.

Gives the information of data pattern. Can detect which value frequently repeated.

Example 10

25 12 9 10 5 12 23 736 13 11 12 31 28 37 614 41 38 44 13 22 18 19

Solution:

0 9 5 7 61 2 0 2 3 1 2 4 3 8 92 5 3 8 23 6 1 7 84 1 4

2.3.1 Frequency Distributions

A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class.

Data presented in form of frequency distribution are called grouped data.

13

The class boundary is given by the midpoint of the upper limit of one class and the lower limit of the next class. Also called real class limit.

To find the midpoint of the upper limit of the first class and the lower limit of the second class, we divide the sum of these two limits by 2.

e.g.:

Class Width (class size)

Class width = Upper boundary – Lower boundary

e.g. : Width of the first class = 600.5 – 400.5 = 200

Class Midpoint or Mark

e.g:

class boundary

14

Constructing Frequency Distribution Tables

1. To decide the number of classes, we used Sturge’s formula, which is

c = 1 + 3.3 log n

where c is the no. of classes n is the no. of observations in the data set.

2. Class width,

This class width is rounded to a convenient number.

3. Lower Limit of the First Class or the Starting Point

Use the smallest value in the data set.

Example 11The following data give the total home runs hit by all players of each of the 30 Major League Baseball teams during 2004 season

15

i) Number of classes, c = 1 + 3.3 log 30 = 1 + 3.3(1.48)

= 5.89 6 class

ii) Class width,

iii) Starting Point = 135Table 2.10 Frequency Distribution for Data of Table 2.9

Total Home Runs Tally f135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242

|||| |||||||||| |||| ||||||||

1025634

2.3.2 Relative Frequency and Percentage Distributions

Example 12 (Refer example 11)Table 2.11: Relative Frequency and Percentage Distributions

Total Home Runs Class Boundaries Relative Frequency

%

135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242

134.5 less than 152.5152.5 less than 170.5170.5 less than 188.5188.5 less than 206.5206.5 less than 224.5224.5 less than 242.5

0.33330.06670.1667

0.20.1

0.1333

33.336.6716.67

2010

13.33Sum 1.0 100%

16

2.3.3 Graphing Grouped Data

1. Histograms

A histogram is a graph in which the class boundaries are marked on the horizontal axis and either the frequencies, relative frequencies, or percentages are marked on the vertical axis. The frequencies, relative frequencies or percentages are represented by the heights of the bars.

In histogram, the bars are drawn adjacent to each other and there is a space between y axis and the first bar.

Example 13 (Refer example 11)

10

2

4

6

8

10

12

Total Home Runs

Freq

uenc

y

Frequency histogram for Table 2.10

2. Polygon

A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon.

Example 13

10

2

4

6

8

10

12

Total home runs

Freq

uenc

y

Frequency polygon for Table 2.10

134.5 152.5 170.5 188.5 206.5 224.5 242.5

134.5 152.5 170.5 188.5 206.5 224.5 242.5

17

For a very large data set, as the number of classes is increased (and the width of classes is decreased), the frequency polygon eventually becomes a smooth curve called a frequency distribution curve or simply a frequency curve.

Frequency distribution curve

2.3.5 Shape of Histogram Same as polygon.

For a very large data set, as the number of classes is increased (and the width of classes is decreased), the frequency polygon eventually becomes a smooth curve called a frequency distribution curve or simply a frequency curve.

The most common of shapes are:(i) Symmetric(ii) Right skewed(iii) Left skewed

Symmetric histograms

Right skewed and Left skewed

18

566337

30 – 3940 – 4950 – 5960 - 6970 – 7980 - 89

30

Number of students (f)

Total

Earnings (RM)

20

2530

35

Describing data using graphs helps us insight into the main characteristics of the data.

When interpreting a graph, we should be very cautious. We should observe carefully whether the frequency axis has been truncated or whether any axis has been unnecessarily shortened or stretched.

2.3.6 Cumulative Frequency Distributions

A cumulative frequency distribution gives the total number of values that fall below the upper boundary of each class.

Example 14: Using the frequency distribution of table 2.11,

Total Home Runs Class Boundaries Cumulative Frequency135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242

134.5 less than 152.5152.5 less than 170.5170.5 less than 188.5188.5 less than 206.5206.5 less than 224.5224.5 less than 242.5

1010+2=1210+2+5=1710+2+5+6=2310+2+5+6+3=2610+2+5+6+3+4=30

Ogive

An ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes.

Two type of ogive:(i) ogive less than(ii) ogive greater than

First, build a table of cumulative frequency.

Example 15 (Ogive Less Than)

051117202330

Less than 29.5Less than 39.5Less than 495Less than 59.5Less than 69.5Less than 79.5Less than 89.5

Cumulative frequency (F)Earnings (RM)

Freq

uenc

y

19

566337

30 – 3940 – 4950 – 5960 - 6970 – 7980 - 89

30

Number of students (f)

Total

Earnings (RM)

0

5

10

15

20

25

30

35

29.5 39.5 49.5 59.5 69.5 79.5 89.5

Earnings

Cumulative Frequency

Example 16 (Ogive Greater Than)

Freq

uenc

y

302519131070

More than 29.5More than 39.5More than 49.5More than 59.5More than 69.5More than 79.5More than 89.5

Cumulative Frequency (F)

Earnings (RM)

20

Smallest value Largest value K1 Median K3

Largest value K1 Median K3

Largest value K1 Median K3

Smallest value

Smallest value

For symmetry data

For left skewed data

For right skewed data

2.3.7 Box-Plot

Describe the analyze data graphically using 5 measurement: smallest value, first quartile (K1), second quartile (median or K2), third quartile (K3) and largest value.

21