Chapter 2

66
Chapter 2 Descriptive Statistics: Tabular and Graphical Methods Statistics for Business (Env) 1

description

Statistics for Business (Env). Chapter 2. Descriptive Statistics: Tabular and Graphical Methods. Descriptive Statistics. 2.1Graphically Summarizing Qualitative Data 2.2Graphically Summarizing Quantitative Data 2.3Dot Plots 2.4Stem-and-Leaf Displays ( Optional ) 2.6Scatter Plots - PowerPoint PPT Presentation

Transcript of Chapter 2

Page 1: Chapter 2

Chapter 2

Descriptive Statistics: Tabular and Graphical Methods

Statistics for Business(Env)

1

Page 2: Chapter 2

Descriptive Statistics

2.1 Graphically Summarizing Qualitative Data2.2 Graphically Summarizing Quantitative Data2.3 Dot Plots2.4 Stem-and-Leaf Displays (Optional)2.6 Scatter Plots2.7 Misleading Graphs and Charts (Optional)

2

Page 3: Chapter 2

Types of Variables/data

Q u a lita t ive o r a ttrib u te(typ e o f ca r ow n ed )

d isc re te(n u m b er o f ch ild ren )

con tin u ou s(t im e taken fo r an exam )

Q u an tita tive o r n u m erica l

D A TA

Nominal

Ordinal

Interval

Ratio

Page 4: Chapter 2

Graphically Summarizing Qualitative Data

• With qualitative data, names identify the different and non-overlapping categories (classes)

• This data can be summarized using a frequency distribution

• Frequency distribution: A table that summarizes the number of items in each of several non-overlapping categories/classes.

4

Page 5: Chapter 2

Example 2.1: Describing 2006 Jeep Purchasing Patterns

• Table 2.1 lists all 251 vehicles sold in 2006 by the greater Cincinnati Jeep dealers

• Table 2.1 does not reveal much useful information

• A frequency distribution is a useful summary– Simply count the number of times each model

appears in Table 2.1

5

Page 6: Chapter 2

•Table 2.1 lists all 251 vehicles sold in 2006 •by the greater Cincinnati Jeep dealers

6

Page 7: Chapter 2

The Resulting Frequency Distribution

Jeep Model Frequency

Commander 71

Grand Cherokee 70

Liberty 80

Wrangler 30

251

Page 8: Chapter 2

Relative Frequency and Percent Frequency

• Relative frequency summarizes the proportion of items in each class

• For each class, divide the frequency of the class by the total number of observations

• Multiply times 100 to obtain the percent frequency

8

Page 9: Chapter 2

The Resulting Relative Frequency and Percent Frequency Distribution

Jeep ModelRelative

FrequencyPercent

Frequency

Commander 0.2829 28.29%

Grand Cherokee 0.2789 27.89%

Liberty 0.3187 31.78%

Wrangler 0.1195 11.95%

1.0000 100.00%

Page 10: Chapter 2

Bar Charts and Pie Charts

• Bar chart: A vertical or horizontal rectangle represents the frequency for each category– Height can be frequency, relative frequency, or

percent frequency• Pie chart: A circle divided into slices where the

size of each slice represents its relative frequency or percent frequency

10

Page 11: Chapter 2

Excel Bar Chart of the Jeep Sales Data

11

Page 12: Chapter 2

Excel Pie Chart of the Jeep Sales Data

Pie chart is usually for percent Frequency Distribution.

12

Page 13: Chapter 2

Pareto Chart• Pareto chart: a type of chart that contains both bars

and a line graph– A bar chart having the different kinds of categories

listed on the horizontal scale• Bar height represents the frequency of occurrence• Bars are arranged in decreasing height from left to right

– Augmented by plotting a cumulative percentage point for each bar

13

Page 14: Chapter 2

Excel Frequency Table and Pareto Chart of Labeling Defects

14

Page 15: Chapter 2

15

Page 16: Chapter 2

Graphically Summarizing Quantitative Data• Often need to summarize and describe the

shape of the distribution• One way is to group the measurements into

classes of a frequency distribution and then displaying the data in the form of a Histogram– A Histogram is a graph in which the class

(numerical) midpoints or limits are marked on the horizontal axis and the class frequencies on the vertical axis.

16

Page 17: Chapter 2

Frequency Distribution

• A frequency distribution is a list of data classes with the count of values that belong to each class– “Classify and count”– The frequency distribution is a table

• Show the frequency distribution in a histogram– picture of the frequency distribution (tabulated

frequencies) shown as bars and the bars are drawn adjacent to each other.

17

Page 18: Chapter 2

Constructing a Frequency Distribution

Steps in making a frequency distribution:1. Find the number of classes2. Find the class length3. Form non-overlapping classes of equal width4. Tally and count5. Graph the histogram

18

Page 19: Chapter 2

Example 2.2 The Payment Time Case: A Sample of Payment Times (in days after billing)

22 29 16 15 18 17 12 13 17 16 15

19 17 10 21 15 14 17 18 12 20 14

16 15 16 20 22 14 25 19 23 15 19

18 23 22 16 16 19 13 18 24 24 26

13 18 17 15 24 15 17 14 18 17 21

16 21 25 19 20 27 16 17 16 21

Table 2.4Table 2.4

Page 20: Chapter 2

Determine the number of Classes• If the number of classes is

– too small lack of detail– too large some classes will be empty

• Group all of the n data into K number of classes• General rule: K is the smallest whole number for

which 2K n• In Examples 2.2 n = 65

– For K = 6, 26 = 64, < n– For K = 7, 27 = 128, > n– So use K = 7 classes

20

Page 21: Chapter 2

Number of Classes In GeneralNumber of Classes Size of Data Set

2 1≤n<4

3 4≤n<8

4 8≤n<16

5 16≤n<32

6 32≤n<64

7 64≤n<128

8 128≤n<256

9 256≤n<528

10 528≤n<1056

Page 22: Chapter 2

Determine the Class Length

• Find the length of each class as the largest measurement minus the smallest divided by the number of classes found earlier (K)

• For Example 2.2, (29-10)/7 = 2.7143– Because payments measured in days, round to

three days

22

Page 23: Chapter 2

Form Non-Overlapping Classes of Equal Width

• The classes start on the smallest value– This is the lower limit of the first class

• The upper limit of the first class is smallest value + class length– In the example, the first class starts at 10 days and goes up

to 13 days

• The next class starts at this upper limit and goes up by class length

• And so on

23

Page 24: Chapter 2

Seven Non-Overlapping Classes Payment Time Example

Class 1 10 days and less than 13 days

Class 2 13 days and less than 16 days

Class 3 16 days and less than 19 days

Class 4 19 days and less than 22 days

Class 5 22 days and less than 25 days

Class 6 25 days and less than 28 days

Class 7 28 days and less than 31 days

Page 25: Chapter 2

Tally and Count the Number of Measurements in Each Class

ClassFirst 4 Tally Marks

All 65 Tally Marks Frequency

10 < 13 ||| 3

13 < 16 |||| |||| |||| 14

16 < 19 || |||| |||| |||| |||| ||| 23

19 < 22 I |||| |||| || 12

22 < 25 | |||| ||| 8

25 < 28 |||| 4

28 < 31 | 1

Page 26: Chapter 2

Histogram

• Rectangles represent the classes• The base represents the class length• The height represents

– the frequency in a frequency histogram, or– the relative frequency in a relative frequency

histogram

26

Page 27: Chapter 2

Histograms

Frequency Histogram Relative Frequency Histogram

27

Page 28: Chapter 2

Example of a frequency distribution histogram for discrete data

28

The frequency distribution of quiz scores in a class. The score, X, is the number of problems that is answered correctly.

Page 29: Chapter 2

29

Example of a frequency distribution histogram for continuous data

Page 30: Chapter 2

30

CONTINUOUS VARIABLES AND REAL LIMITS

Page 31: Chapter 2

31

For a continuous variable, each score actually corresponds to an interval on the scale. The boundaries that separate these intervals are called real limits.The real limit separating two adjacent scores is located exactly halfway between the scores. Each score has two real limits, one at the top of its interval called the upper real limit and one at the bottom of its interval called the lower real limit.

CONTINUOUS VARIABLES AND REAL LIMITS

Page 32: Chapter 2

Some Common Distribution Shapes

• Skewed to the right: The right tail of the histogram is longer than the left tail

• Skewed to the left: The left tail of the histogram is longer than the right tail

• Symmetrical: The right and left tails of the histogram appear to be mirror images of each other

32

Page 33: Chapter 2

33

skewed to the left skewed to the right

Page 34: Chapter 2

A Right-Skewed Distribution

34

Page 35: Chapter 2

A Left-Skewed Distribution

35

Page 36: Chapter 2

Frequency Polygons

• Plot a point above each class midpoint at a height equal to the frequency of the class

• Useful when comparing two or more distributions

36

Page 37: Chapter 2

Example 2.3: Comparing The Grade Distribution for Two Statistics Exams

• Table 2.8 (in textbook) gives scores earned by 40 students on first statistics exam

• Table 2.9 gives the scores on the second exam after an attendance policy

• Due to the way exams are reported, used the classes: 30<40, 40<50, 50<60, 60<70, 70<80, 80<90, and 90<100

37

Page 38: Chapter 2

A Percent Frequency Polygon of the Exam Scores

38

Page 39: Chapter 2

A Percent Frequency Polygon Comparing the Two Exam Scores

39

Page 40: Chapter 2

Cumulative Distributions

• Another way to summarize a distribution is to construct a cumulative distribution

• To do this, use the same number of classes, class lengths, and class boundaries used for the frequency distribution

• Rather than a count, we record the number of measurements that are less than the upper boundary of that class– In other words, a running total

40

Page 41: Chapter 2

Frequency, Cumulative Frequency, and Cumulative Relative Frequency Distribution

Class FrequencyCumulative Frequency

Cumulative Relative

Frequency

Cumulative Percent

Frequency

10 < 13 3 3 3/65=0.0462 4.62%

13 < 16 14 17 17/65=0.2615 26.15%

16 < 19 23 40 0.6154 61.54%

19 < 22 12 52 0.8000 80.00%

22 < 25 8 60 0.9231 92.31%

25 < 28 4 64 0.9846 98.46%

28 < 31 1 65 1.0000 100.00%

Page 42: Chapter 2

Ogive (Cumulative FrequencyCumulative Frequency distribution)

• Ogive: A graph of a cumulative distribution– Plot a point above each upper class boundary at

height of cumulative frequency– Connect points with line segments– Can also be drawn using

• Cumulative relative frequencies• Cumulative percent frequencies

42

Page 43: Chapter 2

A Percent Frequency Ogive of the Payment Times

43

Page 44: Chapter 2

0

2

4

6

8

10

12

14

10 15 20 25 30 35

Hours spent studying

Fre

qu

ency

44

Frequency Distribution For Hours StudyingFrequency Distribution For Hours Studying

Page 45: Chapter 2

Cumulative Frequency Distribution For Hours StudyingCumulative Frequency Distribution For Hours Studying

0

5

10

15

20

25

30

35

10 15 20 25 30 35

Hours Spent Studying

Frequency

45

Page 46: Chapter 2

Dot Plots

• On a number line, each data value is represented by a dot placed above the corresponding scale value

• Dot plots are useful for detecting outliers– Unusually large or small observations that are well

separated from the remaining observations

46

Page 47: Chapter 2

Dot Plots Example

47

Page 48: Chapter 2

Stem-and-Leaf Display

• Purpose is to see the overall pattern of the data, by grouping the data into classes– the variation from class to class– the amount of data in each class– the distribution of the data within each class

• Best for small to moderately sized data distributions

48

Page 49: Chapter 2

49

A set of 24 quiz scores presented as raw data and organized in a Stem-and-Leaf Display

Page 50: Chapter 2

50

Page 51: Chapter 2

51

Advantage of the Stem-and-Leaf Display

Page 52: Chapter 2

Car Mileage Example

• Refer to the Car Mileage Case– Data in Table 2.14

• The stem-and-leaf display:29 830 1345567788831 001233444445566777889932 0111233445577833 03 33 + 0.3 = 33.3

29 + 0.8 = 29.8

52

Page 53: Chapter 2

Constructing a Stem-and-Leaf Display

• There are no rules that dictate the number of stem values

• Can split the stems as needed

53

Page 54: Chapter 2

Split Stems from Car Mileage: Example• Starred classes (*) extend from 0.0 to 0.4• Unstarred classes extend from 0.5 to 09

29 830* 13430 5567788831* 0012334444431 5566777889932* 01112334432 5577833* 03

54

Page 55: Chapter 2

Comparing Two Distributions

• To compare two distributions, can construct a back-to-back stem-and-leaf display

• Uses the same stems for both• One leaf is shown on the left side and the

other on the right

55

Page 56: Chapter 2

Sample Back-to-Back Stem-and-Leaf Display

56

Page 57: Chapter 2

Back-to-Back histogram DisplayComparing Two Distributions with back-to-back bar charts

57

Page 58: Chapter 2

Back-to-Back histogram DisplayComparing Two Distributions with back-to-back bar charts

58

Page 59: Chapter 2

Scatter Plots

• Used to study relationships between two quantitative variables

• Place one variable on the x-axis• Place a second variable on the y-axis• Place dot on pair coordinates

59

Page 60: Chapter 2

Types of Relationships

• Linear: A straight line relationship between the two variables

• Positive: When one variable goes up, the other variable goes up

• Negative: When one variable goes up, the other variable goes down

• No Linear Relationship: There is no coordinated linear movement between the two variables

60

Page 61: Chapter 2

A Scatter Plot Showing a Positive Linear Relationship

61

Page 62: Chapter 2

A Scatter Plot Showing a Little or No Linear Relationship

62

Page 63: Chapter 2

A Scatter Plot Showing a Negative Linear Relationship

63

Page 64: Chapter 2

Misleading Graphs and Charts: Scale Break

Mean Salaries at a Major University, 2004 - 2007

Break the vertical scale to exaggerate effect

64

Page 65: Chapter 2

65

Misleading Graphs and Charts: Horizontal Scale Effects

Page 66: Chapter 2

66

You can use simple mathematical operations (like averages) to create nonsensical “facts” that can drive whatever agenda you’d like. Example: the average wealth of the citizens of a particular town is $100,000, therefore they don’t need any government assistance. (The town consists of 1 stingy millionaire and 9 homeless people.)