Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and...

45
Describing Data Visually Describing Data Visually (Part 1) (Part 1) C h a p t e r 3 3 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.

Transcript of Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and...

Page 1: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

Describing Data Visually Describing Data Visually (Part 1)(Part 1)

Chapter3333

Visual Description

Dot Plots

Frequency Distributions and Histograms

Line Charts

Bar Charts

McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.

Page 2: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-2

Visual DescriptionVisual Description

• Methods of organizing, exploring and summarizing data include:

- VisualVisual (charts and graphs) provides insight into characteristics of a data set without using mathematics.

- NumericalNumerical (statistics or tables) provides insight into characteristics of a data set using mathematics.

Page 3: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-3

• Begin with univariate data (a set of Begin with univariate data (a set of nn observations observations on one variable) and consider the following:on one variable) and consider the following:

CharacteristicCharacteristic InterpretationInterpretation

MeasurementMeasurement What are the units of measurement? Are the data integer or continuous? Any missing observations? Any concerns with accuracy or sampling methods?

Visual DescriptionVisual DescriptionVisual DescriptionVisual Description

Central Central TendencyTendency

Where are the data values concentrated? What seem to be typical or middle data values?

Page 4: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-4

CharacteristicCharacteristic InterpretationInterpretation

DispersionDispersion How much variation is there in the data? How spread out are the data values? Are there unusual values?

Visual DescriptionVisual DescriptionVisual DescriptionVisual Description

ShapeShape Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal?

Page 5: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-5

• P/E ratios are current stock price divided by earnings per share in the last 12 months. For example:

• Example: Price/Earnings RatiosExample: Price/Earnings Ratios

Visual DescriptionVisual DescriptionVisual DescriptionVisual Description

Page 6: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-6

• SortingSorting

• MeasurementMeasurement• Look at the data and visualize how it was

collected and measured.

• Sort the data and then summarize in a graphical display. Here are the sorted P/E ratios:

• A histogramhistogram graphically displays sorted data.

Visual DescriptionVisual DescriptionVisual DescriptionVisual Description

Page 7: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-7

• SortingSorting• Sorting allows you to observe central tendency,

dispersion and shape as well as minimum, maximum and range.

• When the number of observations is large, a sorted list of data values is difficult to analyze.

• To see broader patterns in the data, analysts often prefer a visual display visual display of the data.

Visual DescriptionVisual DescriptionVisual DescriptionVisual Description

Page 8: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-8

• A dot plot is the simplest graphical display of A dot plot is the simplest graphical display of nn individual values of numerical data. individual values of numerical data. - Easy to understand - Easy to understand - Not good for large samples (e.g., > 5,000).- Not good for large samples (e.g., > 5,000).

Steps in Making a Dot PlotSteps in Making a Dot Plot

1. 1. Make a scale that covers the data rangeMake a scale that covers the data range2. 2. Mark the axes and label themMark the axes and label them

3. 3. Plot each data value as a dot above the scale Plot each data value as a dot above the scale at its approximate locationat its approximate locationIf more than one data value lies at about the same axis location, the dots are piled up vertically.

Dot PlotsDot PlotsDot PlotsDot Plots

Page 9: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-9

Creating a Dot Plot in MegaStat

Page 10: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-10

• Range of data shows Range of data shows dispersiondispersion. .

• Can add annotations (text boxes) to call attention to specific features.

• Clustering shows Clustering shows central tendencycentral tendency. . • Dot plots do not tell much of Dot plots do not tell much of shapeshape of distribution. of distribution.

Dot PlotsDot PlotsDot PlotsDot Plots

Page 11: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-11

• Consider the following median home prices for nine U.S. Cities.

Metropolitan AreaMetropolitan AreaMedian Home Price Median Home Price

(000)(000)

Akron OHAkron OH 119.6119.6

Bergen-Passaic NJBergen-Passaic NJ 363.0363.0

Bradenton FLBradenton FL 170.4170.4

Colorado Springs Colorado Springs COCO 181.7181.7

Hartford CTHartford CT 198.5198.5

Milwaukee WIMilwaukee WI 186.2186.2

Raleigh-Durham NCRaleigh-Durham NC 173.8173.8

San Francisco CASan Francisco CA 560.2560.2

Topeka KSTopeka KS 100.7100.7

Dot PlotsDot Plots

• Small Sample: Home PricesSmall Sample: Home Prices

Page 12: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-12

Small Sample: Home PricesSmall Sample: Home Prices• A dot plot is useful to realtors as they discuss

patterns in home selling prices within their community.

Dot PlotsDot PlotsDot PlotsDot Plots

Page 13: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-13

Comparing GroupsComparing Groups• A stacked dot plotstacked dot plot compares two or more

groups using a common X-axis scale.

Dot PlotsDot PlotsDot PlotsDot Plots

Page 14: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-14

Bins and Bin LimitsBins and Bin Limits• A frequency distributionfrequency distribution is a table formed by

classifying n data values into k classes (bins).

• Bin limitsBin limits define the values to be included in each bin. Widths must all be the same.

• FrequenciesFrequencies are the number of observations within each bin.

• Express as relative frequenciesrelative frequencies (frequency divided by the total) or percentagespercentages (relative frequency times 100).

Frequency Distributions and Histograms

Frequency Distributions and Histograms

Page 15: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-15

Constructing a Frequency DistributionConstructing a Frequency Distribution1. 1. Find smallest and largest data valuesFind smallest and largest data values

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

2. 2. Choose the number of bins (Choose the number of bins (kk))

- k should be much smaller than n.

- Too many bins results in sparsely populated bins, too few and dissimilar data values are lumped together.

Page 16: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-16

- Herbert Sturges proposes the following rule:

Sample Sample Size (n)Size (n)

SuggestedSuggestedNumber of Bins Number of Bins

(k)(k)

16 5

32 6

64 7

128 8

Sample Size Sample Size (n)(n)

SuggestedSuggestedNumber of Bins Number of Bins

(k)(k)

256 9

512 10

1024 11

Frequency Frequency DistributionsDistributions and Histogramsand Histograms

Frequency Frequency DistributionsDistributions and Histogramsand Histograms

Constructing a Frequency DistributionConstructing a Frequency Distribution

Page 17: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-17

Constructing a Frequency DistributionConstructing a Frequency Distribution

3. 3. Set the bin limits:Set the bin limits: Bin width max minX X

k

For example, for k = 7 bins, the approximate bin width is:

68 8 608.57

7 7

Bin width

To obtain “nice” limits, we round the width to 10 and start the first bin at 0 to get bin limits:

0, 10, 20, 30, 40, 50, 60, 70

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Page 18: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-18

Constructing a Frequency DistributionConstructing a Frequency Distribution4. 4. Put the data values in the appropriate binPut the data values in the appropriate bin

In general, the lower limit is included in the bin while the upper limit is excluded.

5. 5. Create the table, you can includeCreate the table, you can includeFrequenciesFrequencies – counts for each bin

Relative frequenciesRelative frequencies – absolute frequency divided by total number of data values.

Cumulative frequenciesCumulative frequencies – accumulated relative frequency values as bin limits increase.

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Page 19: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-19

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Page 20: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-20

HistogramsHistograms• A histogramhistogram is a graphical representation of a

frequency distribution.

• A histogramhistogram is a bar chart.

Y-axis shows frequency within each bin.

X-axis ticks shows end points of each bin.

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Page 21: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-21

• Consider 3 histograms for the P/E ratio data with different bin widths. What do they tell you?

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

HistogramsHistograms

Page 22: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-22

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Excel’s HistogramExcel’s Histogram

Page 23: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-23

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Mega Stat's Frequency Distribution and Mega Stat's Frequency Distribution and HistogramsHistograms

Page 24: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-24

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

MINITAB HistogramMINITAB Histogram

Page 25: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-25

Modal ClassModal Class

• A histogram bar that is higher than those on either side.

• UnimodalUnimodal – a single modal class.

• BimodalBimodal – two modal classes.

• MultimodalMultimodal – more than two modal classes.

• Modal classes may be artifacts of the way bin limits are chosen.

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Page 26: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-26

ShapeShape• A histogram suggests the shape of the population.

• Skew nessSkew ness – indicated by the direction of the longer tail of the histogram.

• It is influenced by number of bins and bin limits.

Left-skewed – (negatively skewed) a longer left – (negatively skewed) a longer left tail. tail.

Right-skewed – (positively skewed) a longer right tail.

Symmetric – both tail areas approximately the same.

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Page 27: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-27

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Page 28: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-28

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Frequency Distributions Frequency Distributions and Histogramsand Histograms

Tips for Effective Frequency DistributionsTips for Effective Frequency Distributions• Check Sturges’ Rule first.Check Sturges’ Rule first.• Choose a nice, round bin width.Choose a nice, round bin width.• Choose bin limits that are multiples of the bin Choose bin limits that are multiples of the bin

width.width.• Make sure that the range is covered.Make sure that the range is covered.

Page 29: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-29

Frequency Polygon and OgiveFrequency Polygon and OgiveFrequency Polygon and OgiveFrequency Polygon and Ogive

Page 30: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-30

• Used to display a time series or spot trends, or to compare time periods.

• Can display several variables at once.

Line ChartsLine ChartsLine ChartsLine Charts

Simple Line ChartsSimple Line Charts

Page 31: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-31

• Two-scale line chart – used to compare variables that differ in magnitude or are measured in different units.

Line ChartsLine ChartsLine ChartsLine Charts

Simple Line ChartsSimple Line Charts

Page 32: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-32

Grid LinesGrid Lines• A line graph usually has no vertical grid lines. Horizontal

lines can be added to make it easier to establish the y value. Which is easier to read?

Line ChartsLine ChartsLine ChartsLine Charts

Page 33: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-33

Log ScalesLog Scales• Arithmetic scaleArithmetic scale – distances on the Y-axis are

proportional to the magnitude of the variable being displayed.

• Logarithmic scaleLogarithmic scale – (ratio scale) equal distances represent equal ratios.

• Use a log scalelog scale for the vertical axis when data vary over a wide range, say, by more than an order of magnitude.• This will reveal more detail for small data values.

Line ChartsLine ChartsLine ChartsLine Charts

Page 34: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-34

Log ScalesLog Scales• Log scale is only suited for positive data values.• Reveals whether the quantity is growing at an

increasing percent (concave upward), constant percent (straight line), or declining percent (concave downward)

Line ChartsLine ChartsLine ChartsLine Charts

Page 35: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-35

• What does the log scale graph tell you about growth rate for both series?

Line ChartsLine ChartsLine ChartsLine Charts

Example: U.S. TradeExample: U.S. Trade

Page 36: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-36

When to Use Log ScalesWhen to Use Log Scales

• Useful for - time series data that might be expected to grow at a compound annual percentage rate (e.g., GDP, national debt, future income)

- financial charts that cover long periods of time-data that grow rapidly (e.g., revenues)

Line ChartsLine ChartsLine ChartsLine Charts

Page 37: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-37

Tips for Effective Line ChartsTips for Effective Line Charts

1. Line charts are used for time series data (never for cross-sectional data).

2. Y-axis shows numerical variable while X-axis shows time units with time increasing left to right.

3. Use a zero origin on the Y-axis unless more detail is needed.

Line ChartsLine ChartsLine ChartsLine Charts

Page 38: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-38

Tips for Effective Line ChartsTips for Effective Line Charts

4. Omit numerical labels on a line chart to avoid clutter. Use gridlines if needed.

5. Use data markers (squares, triangles, circles) if they don’t clutter the graph.

6. Don’t make lines too thick.

Line ChartsLine ChartsLine ChartsLine Charts

Page 39: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-39

• Most common way to display attribute data. - Bars represent categories or attributes. - Lengths of bars represent frequencies.

Vertical Bar ChartVertical Bar Chart Horizontal Bar ChartHorizontal Bar Chart

Bar ChartsBar ChartsBar ChartsBar Charts

Plain Bar ChartsPlain Bar Charts

Page 40: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-40

3-D Bar Chart3-D Bar Chart Pyramid ChartPyramid Chart

Bar ChartsBar ChartsBar ChartsBar Charts

3-D and Novelty Bar Charts3-D and Novelty Bar Charts

Page 41: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-41

• Special type of bar chart used in quality Special type of bar chart used in quality management to display the frequency of defects or management to display the frequency of defects or errors of different types. errors of different types.

• Categories are Categories are displayed in displayed in descending descending order order of frequency. of frequency.

• Focus on Focus on significant fewsignificant few (i.e., few (i.e., few categories that categories that account for most defects or errors).account for most defects or errors).

Bar ChartsBar ChartsBar ChartsBar Charts

Pareto ChartsPareto Charts

Page 42: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-42

• Bar height is the sum of several subtotals. Areas may be compared by color to show patterns in the subgroups and total.

Bar ChartsBar ChartsBar ChartsBar Charts

Stacked Bar ChartStacked Bar Chart

Page 43: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-43

• Bar charts can be used for time series data although it may be harder to compare trends.

Bar Bar ChartsChartsBar Bar ChartsCharts

Bar Charts for Time Series DataBar Charts for Time Series Data

Page 44: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

3A-44

Tips for Effective Bar ChartsTips for Effective Bar Charts1. 1. Show the numerical variable of interest with Show the numerical variable of interest with

vertical bars on the vertical bars on the YY-axis, category labels on -axis, category labels on the the XX-axis.-axis.

2. 2. For time series quantities, display the category For time series quantities, display the category labels on the horizontal labels on the horizontal XX-axis with time -axis with time increasing from left to right.increasing from left to right.

3. 3. The height or length of each bar should be The height or length of each bar should be proportional to the quantity displayed.proportional to the quantity displayed.

4. 4. Put numerical values at the top of each bar, Put numerical values at the top of each bar, except if too cluttered.except if too cluttered.

Bar ChartsBar ChartsBar ChartsBar Charts

Page 45: Describing Data Visually (Part 1) Chapter33 Visual Description Dot Plots Frequency Distributions and Histograms Line Charts Bar Charts McGraw-Hill/Irwin.

Applied Statistics in Applied Statistics in Business and EconomicsBusiness and Economics

End of Chapter 3AEnd of Chapter 3A

3A-45