Chapter 1 Describing Data(2)

219
CHAPTER 1 DESCRIPTIVE STATISTICS DESCRIPTIVE STATISTICS

Transcript of Chapter 1 Describing Data(2)

Page 1: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

DESCRIPTIVE

STATISTICS

Page 2: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Introduction

Raw data - Data recorded in the sequence in which there are collected and before they are processed or ranked.

Array data - Raw data that is arranged in ascending or descending order.

Page 3: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 1Quantitative raw data

Page 4: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

These data also called ungrouped data

Example 1Qualitative raw data

Page 5: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Organizing and Graphing

Qualitative Data

Page 6: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Organizing and Graphing Qualitative Data

Frequency Distributions/ Table Relative Frequency and Percentage Distribution Graphical Presentation of Qualitative Data

Frequency Distributions / Table

A frequency distribution for qualitative data lists all categories and the number of elements that belong to each of the categories.

It exhibits the frequencies are distributed over various categories Also called a frequency distribution table or simply a frequency

table. The number of students who belong to a certain category is called the

frequency of that category.

Page 7: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Page 8: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Relative Frequency and Percentage Distribution

A relative frequency distribution is a listing of all categories along with their relative frequencies (given as proportions or percentages).

It is commonplace to give the frequency and relative frequency distribution together.

Calculating relative frequency and percentage of a category

Page 9: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Relative Frequency of a category = Frequency of that category Sum of all frequencies

Percentage = (Relative Frequency)* 100

Page 10: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

W W P Is Is P Is W St Wj

Is W W Wj Is W W Is W Wj

Wj Is Wj Sv W W W Wj St W

Wj Sv W Is P Sv Wj Wj W W

St W W W W St St P Wj Sv

Example 3

A sample of UUM staff-owned vehicles produced by Proton was identified and the make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj = Waja, St = Satria, P = Perdana, Sv = Savvy):

Construct a frequency distribution table for these data with their relative frequency and percentage.

Page 11: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

CategoryFrequency

Relative Frequency

Percentage (%)

Wira 19

Iswara

Perdana

Waja

Satria

Savvy

Total

Page 12: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

CategoryFrequency

Relative Frequency

Percentage (%)

Wira 19

Iswara 8

Perdana

Waja

Satria

Savvy

Total

Page 13: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

CategoryFrequency

Relative Frequency

Percentage (%)

Wira 19

Iswara 8

Perdana 4

Waja 10

Satria 5

Savvy 4

Total 50

Page 14: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

CategoryFrequency

Relative Frequency

Percentage (%)

Wira 19

Iswara 8

Perdana 4

Waja 10

Satria 5

Savvy 4

Total 50

19/50 = 0.38

Page 15: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

CategoryFrequency

Relative Frequency

Percentage (%)

Wira 19

Iswara 8

Perdana 4

Waja 10

Satria 5

Savvy 4

Total 50

19/50 = 0.38

0.16

0.38*100 = 38

Page 16: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

CategoryFrequency

Relative Frequency

Percentage (%)

Wira 19

Iswara 8

Perdana 4

Waja 10

Satria 5

Savvy 4

Total 50

19/50 = 0.38

0.16

0.08

0.38*100 = 38

0.16*100 = 16

Page 17: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

CategoryFrequency

Relative Frequency

Percentage (%)

Wira 19

Iswara 8

Perdana 4

Waja 10

Satria 5

Savvy 4

Total 50

19/50 = 0.38

0.20

0.10

0.16

0.08

0.08

0.38*100 = 38

0.16*100 = 16

0.08*100 = 8

0.20*100 = 20

0.10*100 = 100.08*100 = 8

1001.00

Page 18: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Graphical Presentation of Qualitative Data

Bar Graphs

A graph made of bars whose heights represent the frequencies of respective categories.

Such a graph is most helpful when you have many categories to represent.

Notice that a gap is inserted between each of the bars. It has => simple/ vertical bar chart => horizontal bar chart => component bar chart => multiple bar chart

Page 19: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Simple/ Vertical Bar Chart

To construct a vertical bar chart, mark the various categories on the horizontal axis and mark the frequencies on the vertical axis

Refer to Figure 2.1 and Figure 2.2,

Page 20: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Figure 2.1

Figure 2.2

Page 21: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Horizontal Bar Chart

To construct a horizontal bar chart, mark the various categories on the vertical axis and mark the frequencies on the horizontal axis.

Example 4: Refer Example 3,

Page 22: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

UUM Staff-owned Vehicles Produced By Proton

0 5 10 15 20

Wira

Iswara

Perdana

Waja

Satria

Savvy

Ty

pe

s o

f V

eh

icle

Frequency

Figure 2.3

Page 23: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Another example of horizontal bar chart: Figure 2.4

Figure 2.4: Number of students at Diversity College who are immigrants, by last country of

permanent residence

Page 24: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Component Bar Chart

To construct a component bar chart, all categories is in one bar and every bar is divided into components.

The height of components should be tally with representative frequencies.

Example 5

Suppose we want to illustrate the information below, representing the number of people participating in the activities offered by an outdoor pursuits centre during Jun of three consecutive years.

Page 25: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

2004 2005 2006

Climbing 21 34 36

Caving 10 12 21

Walking 75 85 100

Sailing 36 36 40

Total 142 167 191

Page 26: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Activities Breakdown (Jun)

020406080

100120140160180200

2004 2005 2006

Year

Nu

mb

er

of

pa

rtic

ipa

nts

Sailing

Walking

Caving

Climbing

Figure 2.5

Page 27: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Multiple Bar Chart

To construct a multiple bar chart, each bars that representative any categories are gathered in groups.

The height of the bar represented the frequencies of categories.

Useful for making comparisons (two or more values).

Example 6: Refer example 5,

Page 28: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Activities Breakdown (Jun)

0

20

40

60

80

100

120

2004 2005 2006

Year

Nu

mb

er

of

pa

rtic

ipa

nts

Climbing

Caving

Walking

Sailing

Figure 2.6

Page 29: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Another example of horizontal bar chart: Figure 2.7

Figure 2.7: Preferred snack choices of students at UUM

Page 30: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Pie Chart

A circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories.

An alternative to the bar chart and useful for summarizing a single categorical variable if there are not too many categories.

The chart makes it easy to compare relative sizes of each class/category.

Page 31: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

The whole pie represents the total sample or population. The pie is divided into different portions that represent the different categories.

To construct a pie chart, we multiply 360o by the relative frequency for each category to obtain the degree measure or size of the angle for the corresponding categories.

Example 7 (Table 2.6 and Figure 2.8):

Page 32: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Figure 2.8

Page 33: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 8 (Table 2.7 and Figure 2.9):

Movie Genres

Frequency Relative Frequency

Angle Size

ComedyActionRomanceDramaHorrorForeignScience Fiction

54362828221616

200 1.00 360o

0.270.18

0.140.14

0.110.080.08

Page 34: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 8 (Table 2.7 and Figure 2.9):

Movie Genres

Frequency Relative Frequency

Angle Size

ComedyActionRomanceDramaHorrorForeignScience Fiction

54362828221616

200 1.00

0.270.18

0.140.14

0.110.080.08

360*0.27=97.2O

Page 35: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 8 (Table 2.7 and Figure 2.9):

Movie Genres

Frequency Relative Frequency

Angle Size

ComedyActionRomanceDramaHorrorForeignScience Fiction

54362828221616

200 1.00

0.270.18

0.140.14

0.110.080.08

360*0.27=97.2O

360*0.18=64.8O

Page 36: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 8 (Table 2.7 and Figure 2.9):

Movie Genres

Frequency Relative Frequency

Angle Size

ComedyActionRomanceDramaHorrorForeignScience Fiction

54362828221616

200 1.00 360o

0.270.18

0.140.14

0.110.080.08

360*0.27=97.2O

360*0.18=64.8O

360*0.11=39.6O

360*0.14=50.4O

360*0.08=28.8O

360*0.08=28.8O

360*0.14=50.4O

Page 37: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Figure 2.9

Page 38: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Line Graph/Time Series Graph

A graph represents data that occur over a specific period time of time.

Line graphs are more popular than all other graphs combined because their visual characteristics reveal data trends clearly and these graphs are easy to create.

When analyzing the graph, look for a trend or pattern that occurs over the time period.

Page 39: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example is the line ascending (indicating an increase over time) or descending (indicating a decrease over time).

Another thing to look for is the slope, or steepness, of the line. A line that is steep over a specific time period indicates a rapid increase or decrease over that period.

Two data sets can be compared on the same graph (called a compound time series graph) if two lines are used.

Data collected on the same element for the same variable at different points in time or for different periods of time are called time series data.

Page 40: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

A line graph is a visual comparison of how two variables—shown on the x- and y-axes—are related or vary with each other. It shows related information by drawing a continuous line between all the points on a grid.

Line graphs compare two variables: one is plotted along the x-axis (horizontal) and the other along the y-axis (vertical).

The y-axis in a line graph usually indicates quantity (e.g., RM, numbers of sales litres) or percentage, while the horizontal x-axis often measures units of time. As a result, the line graph is often viewed as a time series graph

Page 41: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 9

A transit manager wishes to use the following data for a presentation showing how Port Authority Transit ridership has changed over the years. Draw a time series graph for the data and summarize the findings.

YearRidership

(in millions)

19901991199219931994

88.085.075.776.675.4

Page 42: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

75

77

79

81

83

85

87

89

1990 1991 1992 1993 1994

Year

Rid

ers

hip

(in

mill

ion

s)

The graph shows a decline in ridership through 1992 and then leveling off for the years 1993 and 1994.

Page 43: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise 1

1.The following data show the method of payment by 16 customers in a supermarket checkout line. Here, C = cash, CK = check, CC = credit card, D = debit and O = other.

C CK CK C CC D O C

CK CC D CC C CK CK CC

a.Construct a frequency distribution table.b.Calculate the relative frequencies and percentages for all categories.c.Draw a pie chart for the percentage distribution.

Page 44: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

1.a). Frequency distribution table, relative frequencies, percentages and angle sizes of all categories.

Method of payment

Frequency, fRelative

frequencyPercentage

(%)Angle Size (o)

CashCheckCredit CardDebitOther

45421

Total 16

0.25000.31250.25000.12500.0625

1

25.0031.2525.0012.50

100

6.25

90112.5

904522.5

360

Page 45: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

b). Pie Chart

25%

31%

25%

13%

6%

Cash

Check

Credit Card

Debit

Other

Page 46: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise 2:The frequency distribution table represents the sale of certain product in ZeeZee Company. Each of the products was given the frequency of the sales in certain period. Find the relative frequency and the percentage of each product. Then, construct a pie chart using the obtained information

Page 47: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

1.a). Frequency distribution table, relative frequencies, percentages and angle sizes of all categories.

Type of product

FrequencyRelative Frequency

Percentage (%)

Angle Size (o)

A 13

B 12

C 5

D 9

E 11

Total 50

0.24

0.26

0.10

0.18

0.22

1.00

26

24

10

18

22

100

93.6

86.4

36.0

64.8

79.2

360

Page 48: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

ORGANIZING AND GRAPHING

QUANTITATIVE DATA

Page 49: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

2.3 ORGANIZING AND GRAPHING QUANTITATIVE DATA

2.3.1 Stem and Leaf Display2.3.2 Frequency Distribution2.3.3 Relative Frequency and Percentage

Distributions.2.3.4 Graphing Grouped Data2.3.5 Shapes of Histogram2.3.6 Cumulative Frequency Distributions.

Page 50: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Stem-and-Leaf Display

In stem and leaf display of quantitative data, each value is divided into two portions – a stem and a leaf. Then the leaves for each stem are shown separately in a display.

Gives the information of data pattern.

Can detect which value frequently repeated.

Page 51: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 10

25 12 9 10 5 12 23 7

36 13 11 12 31 28 37 6

14 41 38 44 13 22 18 19

Solution:

0 6759

02

43

2

1 342132

835

98

176 1

2

48

Page 52: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise 3:

Queen Bakery is the famous shop that sell cake in Town J. The operation manager of Queen Bakery is interested to study about the time that customers queue before serve at the cashier counter. Below is data (in minute) for 20 customers. Construct a stem and leaf table. 5 1 8 1 3

3 15 2 12 16

10 16 9 6 14

7 10 3 11 8

Page 53: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

0 1815 23

16

3

1

83769

46025 0

0 3211 53

61

3

1

98876

54200 6

Page 54: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Frequency Distributions

A frequency distribution for quantitative data list all the classes and the number of values that belong to each class.

Data presented in form of frequency distribution are called grouped data.

Page 55: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Page 56: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

The class boundary is given by the midpoint of the upper limit of one class and the lower limit of the next class. Also called real class limit.

To find the midpoint of the upper limit of the first class and the lower limit of the second class, we divide the sum of these two limits by 2.

Page 57: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

400 401400.5

2 class boundary

e.g.:

Class Width (class size)

Class width = Upper boundary – Lower boundary

e.g. : Width of the first class = 600.5 – 400.5 = 200

Page 58: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Lower limit + Upper limitclass midpoint or mark =

2

401 600Midpoint of the 1st class = 500.5

2

Class Midpoint or Mark

e.g:

Page 59: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Constructing A Frequency Table

Page 60: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Constructing Frequency Distribution Tables

1. To decide the number of classes, we used Sturge’s formula, which is

c = 1 + 3.3 log n

where c is the no. of classes n is the no. of observations in the data set.

Largest value - Smallest value

Number of classesRange

i

ic

2. Class width,This class width is rounded to a convenient

number.

Page 61: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

3. Lower Limit of the First Class or the Starting Point

Use the smallest value in the data set.

Example 11

The following data give the total home runs hit by all players of each of the 30 Major League Baseball teams during 2004 season

Page 62: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Page 63: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:i. number of classes, c = 1 + 3.3log 30

= 1 + 3.3(1.48) = 5.89 ± 6 classes

ii. Class width, i

iii. Starting Point = 135

18

8.176

135242

i

Page 64: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

30f

Table 2.10 Frequency Distribution for Data of Table 2.9

Total Home Runs

Tally f

135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242

IIII IIIIIIIIIIIIII IIIIIIII

1025634

Page 65: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise 4:The followings data shows the information of serving time (in minutes) for 40 customers in a post office:

2.0 4.5 2.5 2.9 4.2 2.9 3.5 2.8

3.2 2.9 4.0 3.0 3.8 2.5 2.3 3.5

2.1 3.1 3.6 4.3 4.7 2.6 4.1 3.1

4.6 2.8 5.1 2.7 2.6 4.4 3.5 3.0

2.7 3.9 2.9 2.9 2.5 3.7 3.3 2.4

Construct a frequency distribution table

Page 66: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:i. number of classes, c = 1 + 3.3log 40

= 1 + 5.29 = 6.29 ≈ 6 classes

ii. Class width, i

iii. Starting Point = 2.0

6.0

52.06

0.21.5

i

Page 67: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Time class Tally f

2.0 – 2.52.6 – 3.13.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

IIII IIIIII IIII IIIIIIII IIIIII IIIIII

7157641

40f

Page 68: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Relative Frequency and Percentage Distribution

100*)(Re

Re

FrequencylativePercentage

f

f

frequencyallofSum

classthatofFrequencyclassaoffrequencylative

Page 69: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 12 (Refer example 11)Table 2.11: Relative Frequency and Percentage Distributions

Total Home Runs

Class Boundaries freq Relative Frequency

%

135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242

134.5 less than 152.5152.5 less than 170.5170.5 less than 188.5188.5 less than 206.5206.5 less than 224.5224.5 less than 242.5

1025634

0.33330.06670.1667

0.20.1

0.1333

33.336.67

16.672010

13.33

Sum 30 1.0 100%

Page 70: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Time class fRelative

frequency

2.0 – 2.52.6 – 3.13.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

0.1750.3750.1750.1500.1000.025

total 1.000 40f

Exercise 5: Refer to exercise 4, construct the relative frequency for the data.

Page 71: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Graphing Grouped Data

Histograms A histogram is a graph in which the class boundaries are

marked on the horizontal axis and either the frequencies, relative frequencies, or percentages are marked on the vertical axis. The frequencies, relative frequencies or percentages are represented by the heights of the bars.

In histogram, the bars are drawn adjacent to each other and there is a space between y axis and the first bar.

Page 72: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

0

2

4

6

8

10

12

1

Total home runs

Fre

qu

en

cy

134.5 152.5 170.5 188.5 206.5 224.5 242.5

Example 13 (Refer example 11)

Figure 2.10: Frequency histogram for Table 2.10

Page 73: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Polygon A graph formed by joining the midpoints of the tops of

successive bars in a histogram with straight lines is called a polygon.

Example 13

Figure 2.11: Frequency polygon for Table 2.10

0

2

4

6

8

10

12

1

Total home runs

Fre

qu

en

cy

134.5 152.5 170.5 188.5 206.5 224.5 242.5

Page 74: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

For a very large data set, as the number of classes is increased (and the width of classes is decreased), the frequency polygon eventually becomes a smooth curve called a frequency distribution curve or simply a frequency curve.

Figure 2.12: Frequency distribution curve

Page 75: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Shape of Histogram Same as polygon. For a very large data set, as the number of classes

is increased (and the width of classes is decreased), the frequency polygon eventually becomes a smooth curve called a frequency distribution curve or simply a frequency curve.

Page 76: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

The most common of shapes are:(i) Symmetric

Figure 2.13 & 2.14: Symmetric histograms

Page 77: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

(ii) Right skewed and (iii) Left skewed

Figure 2.15 & 2.16: Right skewed and Left skewed

Page 78: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Cumulative Frequency Distributions

Page 79: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Cumulative Frequency Distributions

A cumulative frequency distribution gives the total number of values that fall below the upper boundary of each class.

Example 14: Using the frequency distribution of table 2.11, Total

Home Runs

Class Boundaries freqCumulative Frequency

135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242

134.5 less than 152.5152.5 less than 170.5170.5 less than 188.5188.5 less than 206.5206.5 less than 224.5224.5 less than 242.5

1025634

1010+2=1210+2+5=1710+2+5+6=2310+2+5+6+3=2610+2+5+6+3+4=30

Page 80: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

OgiveAn ogive is a curve drawn for the cumulative

frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes.

Two type of ogive: (i) ogive less than (ii) ogive greater than

Page 81: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Earnings (RM)

Number of students (f)

30 – 3940 – 4950 – 5960 - 6970 – 7980 - 89

566337

Total 30

First, build a table of cumulative frequency.

Example 15 (Ogive Less Than)

Earnings (RM) Cumulative

Frequency (F)

Less than 29.5Less than 39.5Less than 49.5Less than 59.5Less than 69.5Less than 79.5Less than 89.5

05

1117202330

Page 82: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Ogive Less Than

05

10152025

3035

Less than29.5

Less than39.5

Less than49.5

Less than59.5

Less than69.5

Less than79.5

Less than89.5

Earning (RM)

Cum

mul

ativ

e Fr

eq

Page 83: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Earnings (RM)

Number of students (f)

30 – 3940 – 4950 – 5960 - 6970 – 7980 - 89

566337

Total 30

302519131070

More than 29.5More than 39.5More than 49.5More than 59.5More than 69.5More than 79.5More than 89.5

Cumulative Frequency (F)

Earnings (RM)

Example 16 (Ogive Greater Than)

Page 84: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Ogive Greater Than

0

5

10

15

20

25

30

35

Morethan 29.5

Morethan 39.5

Morethan 49.5

Morethan 59.5

Morethan 69.5

Morethan 79.5

Morethan 89.5

Cum

mul

ativ

e Fr

eq

Page 85: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise 6:

Using the frequency table that you construct in exercise 4 and 5, build an ogive less than and ogive greater than for the table.

Time class f

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

total 40f

Page 86: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:(ogive less than)

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

Less than 1.95

Page 87: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

Less than 1.95Less than 2.55

Page 88: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

Less than 1.95Less than 2.55Less than 3.15

Page 89: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

Less than 1.95Less than 2.55Less than 3.15Less than 3.75Less than 4.35Less than 4.95Less than 5.55

Page 90: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

Less than 1.95Less than 2.55Less than 3.15Less than 3.75Less than 4.35Less than 4.95Less than 5.55

0

Page 91: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

Less than 1.95Less than 2.55Less than 3.15Less than 3.75Less than 4.35Less than 4.95Less than 5.55

07

Page 92: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

Less than 1.95Less than 2.55Less than 3.15Less than 3.75Less than 4.35Less than 4.95Less than 5.55

07

22

Page 93: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

Less than 1.95Less than 2.55Less than 3.15Less than 3.75Less than 4.35Less than 4.95Less than 5.55

07

2229353940

Page 94: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Page 95: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:(ogive greater than)

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

More than 1.95 40

Page 96: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

More than 1.95More than 2.45

4033

Page 97: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

More than 1.95More than 2.45More than 3.15

403318

Page 98: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Time class f Class boundriesCummulative

frequency

2.0 – 2.52.6 – 3.1 3.2 – 3.73.8 – 4.34.4 – 4.95.0 – 5.5

7157641

More than 1.95More than 2.45More than 3.15More than 3.75More than 4.35More than 4.95More than 5.55

40331811510

Page 99: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Page 100: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise 7:

Given the following frequency table:

i. Construct the suitable ogive for the data

Amount ($) Number of Responses

Cumulative frequency

0 – 99 2 25

100 – 199 2 23

200 – 299 6 -

300 – 399 9 -

400 – 499 4 -

500 – 999 2 2

Page 101: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Amount ($) f Class Boundaries Cumulative frequency

0 – 99 2 More than -0,5 25

100 – 199 2 More than 99.5 23

200 – 299 6 More than 199.5 21

300 – 399 9 More than 299.5 15

400 – 499 4 More than 399.5 6

500 – 999 2 More than 499.5 2

More than 999.5 0

Page 102: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Ogive More Than

0

510

15

2025

30

Morethan -0,5

Morethan 99.5

Morethan199.5

Morethan299.5

Morethan399.5

Morethan499.5

Morethan999.5

Class Boundaries

Cum

ulat

ive

freq

Page 103: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Smallest value Largest value

K1 Median K3

Largest value

K1 Median K3

Largest value

K1 Median K3

Smallest value

Smallest value

For symmetry data

For left skewed data

For right skewed data

Box-Plot

Describe the analyze data graphically using 5 measurement: smallest value, first quartile (K1), second quartile (median or K2), third quartile (K3) and largest value.

Page 104: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Measures of Central Tendency

Ungrouped Data Group Data

(1) Mean (1) Mean

(2) Median (2) Median

(3) Mode (3) Mode

Page 105: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Ungrouped Data

Mean

Mean for population data:

Mean for sample data:

where: = the sum of all values N = the population size n = the sample size, µ = the population mean = the sample mean

x

N

xx

n

x

x

Page 106: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 17The following data give the prices (rounded to thousand RM) of

five homes sold recently in Sekayang.

158 189 265 127 191Find the mean sale price for these homes.Solution:

158 189 265 127 191

5930

5186

xx

n

Thus, these five homes were sold for an average price of RM186 thousand @ RM186 000.

Page 107: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Median

Median is the value of the middle term in a data set that has been ranked in increasing order.Procedure for finding the Median

Step 1: Rank the data set in increasing order.Step 2: Determine the depth (position or location) of

the median.

1

2

n Depth of Median =

Step 3: Determine the value of the Median.

Page 108: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 19Find the median for the following data:10 5 19 8 3

Solution:

(1) Rank the data in increasing order3 5 8 10 19

1

25 1

23

n

Depth of Median =

=

=

(3) Determine the value of the medianTherefore the median is located in third position of the data set.3 5 8 10 19Hence, the Median for above data = 8

(2) Determine the depth of the Median

Page 109: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 20Find the median for the following data:

10 5 19 8 3 15Solution:(1) Rank the data in increasing order

3 5 8 10 15 19 (2) Determine the depth of the Median 1

26 1

23.5

n

Depth of Median =

=

=

Page 110: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Determine the value of the Median

Therefore the median is located in the middle of 3rd position and 4th position of the data set.

8 109

2

Median

Hence, the Median for the above data = 9

-The median gives the center of a histogram, with half of the data values to the left of (or, less than) the median and half to the right of (or, more than) the median.-The advantage of using the median is that it is not influenced by outliers.

Page 111: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Mode Mode is the value that occurs with the highest

frequency in a data set.

Example 21

1. What is the mode for given data?

77 69 74 81 71 68 74 73

Solution: Mode = 74 (this number occurs twice): Unimodal

Page 112: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

2. What is the mode for given data?77 69 68 74 81 71 68 74 73Mode = 68 and 74: Bimodal

A major shortcoming of the mode is that a data set may have none or may have more than one mode.

One advantage of the mode is that it can be calculated for both kinds of data, quantitative and qualitative.

Page 113: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Grouped

Data

Page 114: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

fxμ =

N

fxx =

n x

1.MeanMean for population data:

Mean for sample data:

Where

is the midpoint and f is the frequency of a class.

Page 115: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 22The following table gives the frequency distribution of the number of orders received each day during the past 50 days at the office of a mail-order company. Calculate the mean.

Numberof order

f

10 – 1213 – 1516 – 1819 – 21

4122014

  n = 50

Page 116: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

fx

fx

Solution:Because the data set includes only 50 days, it represents a sample. The value of is calculated in the following table:

Numberof order

f x fx

10 – 1213 – 1516 – 1819 – 21

4122014

11141720

44168340280

  n = 50 = 832

Page 117: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

The value of mean sample is:

fx 832x = = =16.64

n 50

Thus, this mail-order company received an average of 16.64 orders per day during these 50 days

Page 118: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise 8:

A survey research company asks 100 people how many times they have been to the dentist in the last five years. Their grouped responses appear below.

Number of Visits

Number of Responses

0 – 4 16

5 – 9 25

10 – 14 48

15 – 19 11

What is the mean of the data?

Page 119: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

Number of Responses, f

x fx

0 – 4 16

5 – 9 25

10 – 14 48

15 – 19 11

2

Page 120: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

Number of Responses, f

x fx

0 – 4 16 2

5 – 9 25 7

10 – 14 48

15 – 19 11

Page 121: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

Number of Responses, f

x fx

0 – 4 16 2

5 – 9 25 7

10 – 14 48 12

15 – 19 11

Page 122: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

Number of Responses, f

x fx

0 – 4 16 2

5 – 9 25 7

10 – 14 48 12

15 – 19 11 17

Page 123: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

Number of Responses, f

x fx

0 – 4 16 2 36

5 – 9 25 7

10 – 14 48 12

15 – 19 11 17

Page 124: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

Number of Responses, f

x fx

0 – 4 16 2 36

5 – 9 25 7 175

10 – 14 48 12

15 – 19 11 17

Page 125: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

Number of Responses, f

x fx

0 – 4 16 2 36

5 – 9 25 7 175

10 – 14 48 12 576

15 – 19 11 17 187

Page 126: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

xNumber of

Responses, ffx

0 – 4 2 16 32

5 – 9 7 25 175

10 – 14 12 48 576

15 – 19 17 11 187

Total 100 970

Page 127: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

The value of mean sample is:

74.9100

974

n

fx

f

fxx

Thus, an average of times the people have been to the dentist in the last five years is 9.74

Page 128: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

MedianStep 1: Construct the cumulative frequency distribution.Step 2: Decide the class that contain the median.

Class Median is the first class with the value of cumulative frequency is at least

n/2.Step 3: Find the median by using the following formula:

Page 129: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Median

mm

n- F

2= L + if

mLmf

Where:n = the total frequencyF = the total frequency before class mediani = the class width

= the lower boundary of the class median

= the frequency of the class median

Page 130: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 23Based on the grouped data below, find the median:

Time to travel to work

Frequency

1 – 1011 – 2021 – 3031 – 4041 – 50

8141297

Page 131: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:1st Step: Construct the cumulative frequency distribution

Time to travel to work

Frequency Cumulative Frequency

1 – 1011 – 2021 – 3031 – 4041 – 50

8141297

822344350

Page 132: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

23

1012

222

50

5.20

2

i

f

Fn

LMedianm

m

mf mL

Class median is the 3rd class

So, F = 22, = 12, = 20.5 and i = 10 Therefore,

Thus, 25 persons take less than 23 minutes to travel to work and another 25 persons take more than 23 minutes to travel to work.

252

50

2

n

Page 133: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise 9:

A survey research company asks 100 people how many times they have been to the dentist in the last five years. Their grouped responses appear below.

Number of Visits

Number of Responses

0 – 4 16

5 – 9 25

10 – 14 48

15 – 19 11

What is the median of the data?

Page 134: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

Number of Responses

Cumulative frequency

0 – 4 16 16

5 – 9 25

10 – 14 48

15 – 19 11

Page 135: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

Number of Responses

Cumulative frequency

0 – 4 16 16

5 – 9 25 41

10 – 14 48

15 – 19 11

Page 136: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

Number of Responses

Cumulative frequency

0 – 4 16 16

5 – 9 25 41

10 – 14 48 89

15 – 19 11

Page 137: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Number of Visits

Number of Responses

Cumulative frequency

0 – 4 16 16

5 – 9 25 41

10 – 14 48 89

15 – 19 11 100

Page 138: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

502

100

2

n

Class median is the 3rd class

So, F = 41, fm = 12, Lm= 9.5 and i = 5 Therefore,

4375.10

548

412

100

5.9

2

i

f

Fn

LMedianm

m Thus, 50 people take less than 10.4375 times to see the dentist and another 50 people take more than 10.4375 times to see the dentist in the last five years

Page 139: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Mode

•Mode is the value that has the highest frequency in a data set.

•For grouped data, class mode (or, modal class) is the class with the highest frequency.

•To find mode for grouped data, use the following formula

Page 140: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Mode 1mo

1 2

Δ= L + i

Δ + Δ

moL

1

2

Where:

is the lower boundary of class mode

is the difference between the frequency of class mode and the frequency of the class before the class mode

is the difference between the frequency of class mode and the frequency of the class after the class mode i is the class width

Page 141: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 24Based on the grouped data below, find the mode

Time to travel to work

Frequency

1 – 1011 – 2021 – 3031 – 4041 – 50

8141297

Page 142: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

moL 1 2

610 5 10 17 5

6 2

Mode = . .

Solution: Based on the table,

= 10.5, = (14 – 8) = 6, = (14 – 12) = 2 and i = 10

Mode 1mo

1 2

Δ= L + i

Δ + Δ

Page 143: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

We can also obtain the mode by using the histogram;

Figure 2.19

Page 144: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise 10:

The following table gives the distribution of the share’s price for ABC Company which was listed in BSKL in 2005.

Price (RM) Frequency

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

Find the mode for this data using formula and histogram.

Page 145: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

64.18

31811

115.17

21

1

iLMode mo

Page 146: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

25 24 23 22 21 20 19 18 17 16 15 14 13 12

fr

eq

ue

ncy

11 10 9 8 7

6

5 4 3 2 1 11.5 14.5 17.5 20.5 23.5 26.5 29.5 class boundries mode = 18.5

Page 147: Chapter 1 Describing Data(2)

Median using ogive:

Example 1: example 15

Example 2: example 16

Page 148: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Relationship among mean, median & mode

As discussed in previous topic, histogram or a frequency distribution curve can assume either skewed shape or symmetrical shape.

Knowing the value of mean, median and mode can give us some idea about the shape of frequency curve.

Page 149: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Figure 2.20: Mean, median, and mode for a symmetric histogram and frequency distribution curve

Figure 2.21: Mean, median, and mode for a histogram and frequency distribution curve skewed to the right

Page 150: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Figure 2.22: Mean, median, and mode for a histogram and frequency distribution curve skewed to the left

Page 151: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise 11:For the following situations, state whether

it is symmetry, skewed to the right or skewed to the left.Mean = 10, median = 15, mode = 20Mean = 15, median = 10, mode = 7Mean = 10, median = 10, mode = 11Mean = 11, median = 12, mode = 12

Left skewed

Right skewed

Approx Symmetry

Approx Symmetry

Page 152: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Dispersion Measurement

The measures of central tendency such as mean, median and mode do not reveal the whole picture of the distribution of a data set.

Two data sets with the same mean may have a completely different spreads.

The variation among the values of observations for one data set may be much larger or smaller than for the other data set.

Page 153: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Ungrouped Data

RANGE = Largest value – Smallest value

Example 25:

Find the range of production for this data set,

1.Range

Page 154: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

Range = Largest value – Smallest value = 267 277 – 49 651 = 217 626

Disadvantages:being influenced by outliers.Based on two values only. All other values in a data set

are ignored.

Page 155: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Variance and Standard Deviation

Standard deviation is the most used measure of dispersion.

A Standard Deviation value tells how closely the values of a data set clustered around the mean.

Lower value of standard deviation indicates that the data set value are spread over relatively smaller range around the mean.

Larger value of data set indicates that the data set value are spread over relatively larger around the mean (far from mean).

Page 156: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

NN

xx

2

2

2

1

2

2

2

nn

xx

s

Standard deviation is obtained the positive root of the variance:

VarianceStandard Deviation

Population

Sample 2ss

2

Page 157: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 26

Let x denote the total production (in unit) of company

Company Production

ABCDE

62931267534

Find the variance and standard deviation,

Page 158: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

 

351502 x

Company Production (x)

x2

ABCDE

6293

1267534

38448649

15 87656251156

  1156

Page 159: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

2

55 1

1182 50

39035150-

=

=

2

2

2

xx -

ns =n -1

.

1182 50

34 3875

s .

.

Since s2 = 1182.50;Therefore,

Page 160: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

The properties of variance and standard deviation:

The standard deviation is a measure of variation of all values from the mean.

The value of the variance and the standard deviation are never negative. Also, larger values of variance or standard deviation indicate greater amounts of variation.

The value of s can increase dramatically with the inclusion of one or more outliers.

Page 161: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Grouped Data

Range = Upper bound of last class – Lower bound of first class

Class Frequency

41 – 5051 – 6061 – 7071 – 8081 – 9091 - 100

137

13106

Total 40

Upper bound of last class = 100.5Lower bound of first class = 40.5Range = 100.5 – 40.5 = 60

Page 162: Chapter 1 Describing Data(2)

VarianceStandard Deviation

Population

Sample

CHAPTER 1DESCRIPTIVE STATISTICS

2

2

2

fx

fxN

N

2

2

2

1

fx

fxns

n

Variance and Standard Deviation

2ss

2

Page 163: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 27Find the variance and standard deviation for the following data:

No. of order f

10 – 1213 – 1516 – 1819 – 21

4122014

Total n = 50

Page 164: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

No. of order f x fx fx2

10 – 1213 – 1516 – 1819 – 21

4122014

11141720

44168340280

484235257805600

Total n = 50 857 14216

Page 165: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

2

2

2

2

1

83214216

5050 1

7 5820

fxfx

nsn

.

75.25820.72 ss

Variance,

Thus, the standard deviation of the number of orders received at the office of this mail-order company during the past 50 days is 2.75.

Standard Deviation,

Page 166: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise 13:

Refer to exercise 10, find the variance and standard deviation for the data.

Price (RM) Frequency

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

Page 167: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

13

Total 60

Page 168: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

1316

Total 60

Page 169: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

131619

Total 60

Page 170: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

13161922

Total 60

Page 171: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

1316192225

Total 60

Page 172: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

131619222528

Total 60

Page 173: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

131619222528

65

Total 60

Page 174: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

131619222528

65224

Total 60

Page 175: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

131619222528

6522447515415084

Total 60

Page 176: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

131619222528

6522447515415084

Total 60 1152

Page 177: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

131619222528

6522447515415084

169256361484625784

Total 60 1152

Page 178: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution

Price (RM) f x fx x2 fx2

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

131619222528

6522447515415084

169256361484625784

84535849025338837502352

Total 60 1152 22944

Page 179: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

9932.1359

6.82559

601152

22944

12

2

2

2

nn

fxfx

s

Standard deviation:

Since s2 = 13.9932;Therefore,

Variance

7407.3

9932.13

2

ss

Page 180: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Relative Dispersion Measurement

To compare two or more distribution that has different unit based on their dispersion OR

To compare two or more distribution that has same unit but big different in their value of mean.

Page 181: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

)(%100

)(%100

populationx

CV

samplex

sCV

FORMULA

Also called modified coefficient or coefficient of variation, CV.

Page 182: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 26

Given mean and standard deviation of monthly salary for two groups of worker who are working in ABC company- Group 1: 700 & 20 and Group 2 :1070 & 20. Find the CV for every group and determine which group is more dispersed.

Page 183: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

1

2

20100 2 86

70020

100 1 871070

CV % . %

CV % . %

The monthly salary for group 1 worker is more dispersed compared to group 2.

Page 184: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

MEASURE OF POSITION

QUARTILEINTERQUARTILE RANGE

Page 185: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Determines the position of a single value in relation to other values in a sample or a population data set.

QuartilesQuartiles are three summary measures that divide

ranked data set into four equal parts.

Page 186: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

1

4

1Depth of Q =

n

3 1

4

3Depth of Q =

(n )

FORMULA

FORMULA

oThe 1st quartiles – denoted as Q1

oThe 2nd quartiles – median of a data set or Q2

oThe 3rd quartiles – denoted as Q3

Page 187: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 27

Table below lists the total revenue for the 11 top tourism company in Malaysia

109.7 79.9 121.2 76.4 80.2 82.1 79.4 89.3 98.0 103.5 86.8

Page 188: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

1 11 13

4 4

1Depth of Q = = =

n

3 11 13 19

4 4

3Depth of Q = = =

(n )

Solution:

Step 1: Arrange the data in increasing order76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2

Step 2: Determine the depth for Q1 and Q3

Page 189: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Step 3: Determine the Q1 and Q3

76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2

Q1 = 79.9 ; Q3 = 103.5

Page 190: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

FORMULA

Interquartile Range

The difference between the third quartile and the first quartile for a data set.

IQR = Q3 –

Q1

Page 191: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Example 28

Table below list the total revenue for the 12 top tourism company in Malaysia

109.7 79.9 74.1 121.2 76.4 80.2 82.1 79.4 89.3 98.0 103.5 86.8

Determine the IQR of the data

Page 192: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

1 12 13 25

4 4

1Depth of Q = = =

n.

3 12 13 19 75

4 4

3Depth of Q = = =

(n ).

Solution:

Step 1: Arrange the data in increasing order74.1 76.4 79.4 79.9 80.2 82.1

86.8 89.3 98.0 103.5 109.7 121.2

Step 2: Determine the depth for Q1 and Q3

Page 193: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Step 3: Determine the Q1 and Q3

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2

Q1 = 79.4 + 0.25 (79.9 – 79.4) = 79.525

Q3 = 98.0 + 0.75 (103.5 – 98.0) = 102.125

Therefore,

IQR = Q3 – Q1 = 102.125 – 79.525 = 22.6

Page 194: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Quartile For Group Data

1

1

1 QQ

n- F

4Q L + if

3

3

3 QQ

3n- F

4Q L + if

FORMULA

From Median, we can get Q1 and Q3 equation as follows:

Page 195: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Time to travel to work

Frequency

1 – 1011 – 2021 – 3031 – 4041 – 50

81412 9 7

Example:

Find Q1 and Q3 for the following data

Page 196: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Time to travel to work

FrequencyCumulative

Frequency

1 – 1011 – 2021 – 3031 – 4041 – 50

81412 9 7

Page 197: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Time to travel to work

FrequencyCumulative

Frequency

1 – 1011 – 2021 – 3031 – 4041 – 50

81412 9 7

8

Page 198: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Time to travel to work

FrequencyCumulative

Frequency

1 – 1011 – 2021 – 3031 – 4041 – 50

81412 9 7

822

Page 199: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Time to travel to work

FrequencyCumulative

Frequency

1 – 1011 – 2021 – 3031 – 4041 – 50

81412 9 7

82234

Page 200: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Time to travel to work

FrequencyCumulative

Frequency

1 – 1011 – 2021 – 3031 – 4041 – 50

81412 9 7

822344350

Page 201: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

1

n 50Class Q 12 5

4 4.

1

1

14

12 5 810 5 10

14

13 7143

QQ

n- F

Q L if

. - .

.

Class Q1 is the 2nd class

Therefore,

Page 202: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

3

3 503nClass Q 37 5

4 4.

3

3

34

37 5 3430 5 10

9

34 3889

QQ

n- F

Q L if

. - .

.

Class Q3 is the 4th class

Therefore,

Page 203: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Exercise:

Find Q1 and Q3 for the following data

Price (RM) Frequency

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

Page 204: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Answer:

Price (RM) FrequencyCumulative

freq

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

Page 205: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Answer:

Price (RM) FrequencyCumulative

freq

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

5

Page 206: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Answer:

Price (RM) FrequencyCumulative

freq

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

519

Page 207: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Answer:

Price (RM) FrequencyCumulative

freq

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

51944

Page 208: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Answer:

Price (RM) FrequencyCumulative

freq

12 – 1415 – 1718 – 2021 – 2324 – 2627 - 29

51425763

51944515760

Page 209: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

, Class Q1 is the 2nd class

Therefore,

154

60

41 n

QClass

6429.16

314

5155.14

4

111

i

f

Fn

LQq

q

Page 210: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

, Class Q3 is the 4th class

Therefore,

454

)60(3

4

33

nQClass

9286.20

37

44455.20

43

333

i

f

Fn

LQq

q

Page 211: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

MEASURE

OF

SKEWNESS

Page 212: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

FORMULA

•To determine the skewness of data (symmetry, left skewed, right skewed)

Also called Skewness Coefficient or Pearson Coefficient of Skewness

Page 213: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

• If Sk +ve right skewed

• If Sk -ve left skewed

• If Sk = 0 symmetry

If Sk takes a value in between (-0.9999, -0.0001) or (0.0001, 0.9999) approximately symmetry.

Page 214: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

The duration of cancer patient warded in Hospital Seberang Jaya recorded in a frequency distribution. From the record, the mean is 28 days, median is 25 days and mode is 23 days. Given the standard deviation is 4.2 days.What is the type of distribution?Find the skewness coefficient

Example 32

Page 215: Chapter 1 Describing Data(2)

CHAPTER 1DESCRIPTIVE STATISTICS

Solution:

This distribution is right skewed because the mean is the largest value

28 2311905

4 2

3 3 28 2521429

4 2

Mean - Mode

OR

Mean - Median

k

k

S .s .

S .s .

So, from the Sk value this distribution is right skewed.

Page 216: Chapter 1 Describing Data(2)

EXERCISE

1. A student want to study a level of satisfaction toward a price of a product at Queen supermarket. She take a simple random of 100 customers and asked them whether they ‘very satisfied’, ‘satisfied’, ‘not sure’, ‘not satisfied’, or ‘very not satisfied’. State:

Page 217: Chapter 1 Describing Data(2)

Population: All customers at Queen Supermarket

Sample100 customers at Queen Supermarket

Variablesatisfaction

Type of variableQualitative variable

Data value Very satisfied / satisfied / not sure, not satisfied / very not satisfied

Type of data collection in the survey Face tot face

Page 218: Chapter 1 Describing Data(2)
Page 219: Chapter 1 Describing Data(2)