Week 2 slides_15 (FINANCE1)

87
WEEK 2: ORGANIZING AND VISUALIZING DATA BUSS1020 Quantitative Business Analysis BUSS1020 1

Transcript of Week 2 slides_15 (FINANCE1)

Page 1: Week 2 slides_15 (FINANCE1)

BUSS1020 1

WEEK 2: ORGANIZING AND VISUALIZING DATA

BUSS1020 Quantitative Business Analysis

Page 2: Week 2 slides_15 (FINANCE1)

BUSS1020 2http://visual.ly/nachos-101

Page 3: Week 2 slides_15 (FINANCE1)

http://visual.ly/power-infographics

BUSS1020

Page 4: Week 2 slides_15 (FINANCE1)

BUSS1020 4

LEARNING OBJECTIVES

In this section you will learn: To construct tables and charts to visualize and organise numerical data

To construct tables and charts to visualize and organise categorical data

The principles of properly presenting summarized data and graphs

Text pages 37-107

Focus is on structured data

Page 5: Week 2 slides_15 (FINANCE1)

BUSS1020 5

AGENDA

Introduction

Categorical Data Organising One Variable Categorical Data Visualising One Variable Categorical Data Organising Two Variable Categorical Data Visualising Two Variable Categorical Data

Numerical Data Organising Numerical Data Visualising One Variable Numerical Data Visualising Two Variable Numerical Data

Principles of Graphical Excellence

DCOVADCOVADCOVADCOVA

DCOVADCOVADCOVA

DCOVA

Page 6: Week 2 slides_15 (FINANCE1)

6BUSS1020

Page 7: Week 2 slides_15 (FINANCE1)

BUSS1020 7

Data is organised and visualized so as to

reveal, glean insight from and communicate

the information,

especially the main features and patterns,

that are hidden within it.

First step: find the “story” in the data

Last step: communicate the “story”, well

ORGANISING AND VISUALISING DATA

Page 8: Week 2 slides_15 (FINANCE1)

ORGANIZING CATEGORICAL DATA: TABLES

BUSS1020 8

Categorical Data

Tallying Data

Summary Table

One Categorical

Variable

Two Categorical Variables

Contingency Table

DCOVA

Page 9: Week 2 slides_15 (FINANCE1)

BUSS1020

SUMMARY TABLE

Banking Preference? PercentATM 16%

Telephone 2%

Drive-through service at branch 17%

In person at branch 41%

Internet 24%

9

A summary table indicates the frequency, amount, percentage or proportion of items in each of a set of categories

This allows you to see: the relative frequency of each category the differences between the categories

A Survey of 1000 Bank Customers:

Should the bank stop telephone banking?

Page 10: Week 2 slides_15 (FINANCE1)

BUSS1020 10

FIDELITY INVESTMENTS

Once considered stopping its bill-paying service for customers

The service was losing money and used by very few customers

However, an analysis of their customer database showed that those using this service were among the most loyal and the most profitable customers.

Fidelity retained the service, so as not to lose those customers who contributed enormously to their profit margin.

Page 11: Week 2 slides_15 (FINANCE1)

11BUSS1020

Page 12: Week 2 slides_15 (FINANCE1)

BUSS1020 12

Frequency table results for Type: Count = 316

Summary table for Retirement Fund data: “Type”

Type Frequency Percent of Total

Growth 227 71.8

Value 89 28.2

Retirement Funds.xlsx

SUMMARY TABLE

Page 13: Week 2 slides_15 (FINANCE1)

13

Risk Frequency Percent of Total

Cumulative Percent of

TotalLow 212 67.1 67.1Average 91 28.8 95.9High 13 4.1 100

Frequency table results for Risk: Count = 316

What type and scale is “Risk”?

Why did I add a Cumulative column?

Retirement Funds.xlsx

SUMMARY TABLE

BUSS1020

Page 14: Week 2 slides_15 (FINANCE1)

BUSS1020 14

AGENDA

Introduction

Categorical Data Organising One Variable Categorical DataVisualising One Variable Categorical Data Organising Two Variable Categorical Data Visualising Two Variable Categorical Data

Numerical Data Organising Numerical Data Visualising One Variable Numerical Data Visualising Two Variable Numerical Data

Principles of Graphical Excellence

DCOVADCOVADCOVADCOVA

DCOVADCOVADCOVA

DCOVA

Page 15: Week 2 slides_15 (FINANCE1)

VISUALIZING CATEGORICAL DATA (1 VAR)

BUSS1020 15

Categorical Data

Visualizing Data

BarChart

Summary Table For

One Variable

Contingency Table For Two

Variables

Side-By-Side Bar Chart

Pie Chart

ParetoChart

DCOVA

Chap 2, section 3, pg 53

Page 16: Week 2 slides_15 (FINANCE1)

BUSS1020 16

Banking Preference

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

ATM

Automated or live telephone

Drive-through service at branch

In person at branch

Internet

VISUALIZING CATEGORICAL DATA: THE BAR CHART

Banking Preference?

%

ATM 16%

Automated or live telephone

2%

Drive-through service at branch

17%

In person at branch

41%

Internet 24%

One bar for each category, bar length represents the amount, frequency or percentage of values in that category

Page 17: Week 2 slides_15 (FINANCE1)

17

Frequency of Risk level

Percentage of Fund TypeRetirement Fund data

Retirement Funds.xlsx

BUSS1020

Page 18: Week 2 slides_15 (FINANCE1)

BUSS1020

VISUALIZING CATEGORICAL DATA: THE PIE CHART

A shaded circle with one slice for each category. Slice size represents the percentage in each category.

Banking Preference

16%

2%

17%

41%

24%

ATM

Automated or livetelephone

Drive-through service atbranch

In person at branch

Internet

Banking Preference? %

ATM 16%

Automated or live telephone

2%

Drive-through service at branch

17%

In person at branch 41%

Internet 24%

18

Chap 2 pg 54

Page 19: Week 2 slides_15 (FINANCE1)

19

Frequency of Risk level

Percentages of Risk levelRetirement Fund data

Retirement Funds.xlsx

BUSS1020

Page 20: Week 2 slides_15 (FINANCE1)

BUSS1020 20

Chap 2, section 3, pg 55VISUALIZING CATEGORICAL DATA:THE PARETO CHARTFor categorical, nominal scale data

Vertical bar chart, categories shown in descending order of frequency

A cumulative polygon is also shown

Separates the “vital few” from the “trivial many”

Not available in STATCRUNCH!

See pg 92 Berenson for Excel instructions

Page 21: Week 2 slides_15 (FINANCE1)

BUSS1020

Chap 2 pg 57

THE PARETO CHART

21

War

ped

card

jam

med

Card

unr

eada

ble

ATM m

alfu

nctio

ns

ATM o

ut o

f cas

h

Inva

lid a

mou

nt re

ques

ted

Wro

ng k

eyst

roke

Lack

of f

unds

in a

ccou

nt0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Incomplete ATM transactions

Page 22: Week 2 slides_15 (FINANCE1)

BUSS1020 22

AGENDA

Introduction

Categorical Data Organising One Variable Categorical Data Visualising One Variable Categorical DataOrganising Two Variable Categorical Data Visualising Two Variable Categorical Data

Numerical Data Organising Numerical Data Visualising One Variable Numerical Data Visualising Two Variable Numerical Data

Principles of Graphical Excellence

DCOVADCOVADCOVADCOVA

DCOVADCOVADCOVA

DCOVA

Page 23: Week 2 slides_15 (FINANCE1)

23

ORGANIZING CATEGORICAL DATA: TABLES

Categorical Data

Tallying Data

Summary Table

One Categorical

Variable

Two Categorical Variables

Contingency Table

BUSS1020

DCOVA

Page 24: Week 2 slides_15 (FINANCE1)

BUSS1020 24

CONTINGENCY TABLE

Can show pattern or relationship between two or more categorical variables

Cross tabulates or tallies jointly the responses of the categorical variables

DCOVA

Page 25: Week 2 slides_15 (FINANCE1)

BUSS1020 25

Average High Low TotalGrowth 74 10 143 227Value 17 3 69 89Total 91 13 212 316

Contingency table results:Rows: TypeColumns: Risk

Contingency table for Retirement Fund data: “Type” vs “Risk”

Is there a pattern or relationship?If so, what is it?

CONTINGENCY TABLE

Page 26: Week 2 slides_15 (FINANCE1)

26

Cell format

Count(Row percent)(Column percent)

Average High Low TotalGrowthRow %

Column %

74(32.6%)

(81.32%)

10(4.41%)

(76.92%)

143(63%)

(67.45%)

227(100%)

(71.84%)

Value 17(19.1%)

(18.68%)

3(3.37%)

(23.08%)

69(77.53%)(32.55%)

89(100%)

(28.16%)Total 91

(28.8%)(100%)

13(4.11%)(100%)

212(67.09%)

(100%)

316(100%)(100%)

Contingency table results:Rows: TypeColumns: Risk

Is there a pattern or relationship?

If so, what is it?

BUSS1020

Page 27: Week 2 slides_15 (FINANCE1)

27

“PIVOT TABLE” CONTINGENCY TABLE FOR BOND DATA

Fund Number Type Assets Fees

Expense Ratio

Return 2009

3-Year Return

5-Year Return Risk

FN-1 Intermediate Government 7268.1No 0.45 6.9 6.9 5.5Below average

FN-2 Intermediate Government 475.1No 0.50 9.8 7.5 6.1Below average

FN-3 Intermediate Government 193.0No 0.71 6.3 7.0 5.6Average

FN-4 Intermediate Government 18603.5No .13 5.4 6.6 5.5Average

FN-5 Intermediate Government 142.6No 0.60 5.9 6.7 5.4Average

FN-6 Intermediate Government 1401.6No 0.54 5.7 6.4 6.2Average

BUSS1020

Page 28: Week 2 slides_15 (FINANCE1)

28

CAN EASILY CONVERT TO AN OVERALL PERCENTAGES TABLE

Intermediate government funds are much morelikely to charge a fee.

BUSS1020

Page 29: Week 2 slides_15 (FINANCE1)

BUSS1020 29

EASILY ADD VARIABLES TO AN EXISTING TABLE

Chap 2-29

Is the pattern of risk the same for all combinations offund type and fee charge?

Page 30: Week 2 slides_15 (FINANCE1)

BUSS1020 30

AGENDA

Introduction

Categorical Data Organising One Variable Categorical Data Visualising One Variable Categorical Data Organising Two Variable Categorical DataVisualising Two Variable Categorical Data

Numerical Data Organising Numerical Data Visualising One Variable Numerical Data Visualising Two Variable Numerical Data

Principles of Graphical Excellence

DCOVADCOVADCOVADCOVA

DCOVADCOVADCOVA

DCOVA

Page 31: Week 2 slides_15 (FINANCE1)

BUSS1020 31

VISUALIZING CATEGORICAL DATA (2 VAR)

Categorical Data

Visualizing Data

BarChart

Summary Table For One

Variable

Contingency Table For

Two Variables

Side-By-Side Bar Chart

Pie Chart

ParetoChart

Chap 2 Sect 3 pg 57

DCOVA

Page 32: Week 2 slides_15 (FINANCE1)

BUSS1020 32

VISUALIZING CATEGORICAL DATA:SIDE-BY-SIDE BAR CHARTSThe side by side bar chart represents the

data from a contingency table.

No Errors

Errors

0.0%

10.0

%

20.0

%

30.0

%

40.0

%

50.0

%

60.0

%

70.0

%

Invoice Size Split Out By Errors & No Errors

Large Medium Small

Invoices with errors are much more likely to be ofmedium size (61.54% vs 30.77% and 7.69%)

NoErrors Errors Total

SmallAmount

50.75% 30.77% 47.50%

MediumAmount

29.85% 61.54% 35.00%

LargeAmount

19.40% 7.69% 17.50%

Total100.0% 100.0% 100.0%

Page 33: Week 2 slides_15 (FINANCE1)

33

Risk level vs Fund Type

BUSS1020

Page 34: Week 2 slides_15 (FINANCE1)

34

Risk level vs Fund Type

BUSS1020

Page 35: Week 2 slides_15 (FINANCE1)

35

Risk level vs Fund Type

BUSS1020

Page 36: Week 2 slides_15 (FINANCE1)

36

Risk level vs Fund Type

BUSS1020

Page 37: Week 2 slides_15 (FINANCE1)

BUSS1020 37

PIE CHARTS VS BAR CHARTSWhich is best?

When should each be used?

What properties make a “good” or “bad” pie chart and/or bar chart

http://www.perceptualedge.com/examples.php

Page 38: Week 2 slides_15 (FINANCE1)

38

What issues can you see with this plot?

http://www.perceptualedge.com/example12.php

www.sharebuilder.com

BUSS1020

Page 39: Week 2 slides_15 (FINANCE1)

39

Is this better? Why?

Good points?

Negatives?

http://www.perceptualedge.com/example12.php

BUSS1020

Page 40: Week 2 slides_15 (FINANCE1)

40http://flowingdata.com/2012/06/15/what-3-d-pie-charts-are-good-for/

BUSS1020

Page 41: Week 2 slides_15 (FINANCE1)

BUSS1020 41

SOME PRINCIPLES OF GRAPHINGMaximise message; minimise noise

Include a title and label axes

Include a reference to the source

Keep things in correct proportions

Visualize This by Nathan Yau

Page 42: Week 2 slides_15 (FINANCE1)

http://www.perceptualedge.com/example9.php

BUSS1020

Page 43: Week 2 slides_15 (FINANCE1)

43

Over time

Spatially

HOW DO CATEGORY PROPORTIONS CHANGE?

Interactive graphics

BUSS1020

Page 44: Week 2 slides_15 (FINANCE1)

SPATIAL “HEAT” MAPS

http://www.nytimes.com/interactive/2013/10/02/us/uninsured-americans-map.html?_r=0BUSS1020

Page 45: Week 2 slides_15 (FINANCE1)

CATEGORIES OVER TIMEhttp://www.nytimes.com/interactive/2014/08/13/upshot/where-people-in-each-state-were-born.html?abt=0002&abg=0

BUSS1020

Page 46: Week 2 slides_15 (FINANCE1)

BUSS1020 46

SOCRATIVE QUESTION ON THIS PLOT

https://b.socrative.comRoom:9633CA5F

Page 47: Week 2 slides_15 (FINANCE1)

BUSS1020 47

SOCRATIVE QUESTION ON THIS PLOT

https://b.socrative.comRoom:9633CA5F

Page 48: Week 2 slides_15 (FINANCE1)

BUSS1020 48

AGENDA

Introduction

Categorical Data Organising One Variable Categorical Data Visualising One Variable Categorical Data Organising Two Variable Categorical Data Visualising Two Variable Categorical Data

Numerical DataOrganising Numerical Data Visualising One Variable Numerical Data Visualising Two Variable Numerical Data

Principles of Graphical Excellence

DCOVADCOVADCOVADCOVA

DCOVADCOVADCOVA

DCOVA

Page 49: Week 2 slides_15 (FINANCE1)

BUSS1020 49

Chap 2 pg 44 - 51

TABLES FOR ORGANIZINGNUMERICAL DATA

Numerical Data

Ordered Array

DCOVA

CumulativeDistributions

FrequencyDistributions

Page 50: Week 2 slides_15 (FINANCE1)

BUSS1020 50

Chap 2 pg 44ORGANIZING NUMERICAL DATA: ORDERED ARRAY

Age of Surveyed Uni Students

Day Students

16 17 17 18 18 18

19 19 20 20 21 22

22 25 27 32 38 42Night Students

18 18 19 19 20 21

23 28 32 33 41 45

An ordered array is a sequence of data, in rank order, from the smallest to the largest value.

Shows range: i.e. minimum to maximum value

May help identify “outliers”

Page 51: Week 2 slides_15 (FINANCE1)

BUSS1020 51

Chap 2 pg 45

ORGANIZING NUMERICAL DATA: FREQUENCY DISTRIBUTIONA frequency distribution is a summary table: data are arranged into numerically ordered classes.

An appropriate number of class groupings, a suitable width of a class grouping, and the boundaries of each class grouping need to be chosen.

The number of class groups depends on the number of values in the data. In general, a frequency distribution should have at least 5 but no more than 15 classes.

To determine the width of a class interval, you divide the range (Highest value–Lowest value) of the data by the number of class groupings desired.

Page 52: Week 2 slides_15 (FINANCE1)

BUSS1020 52

ORGANIZING NUMERICAL DATA: FREQUENCY DISTRIBUTION EXAMPLEExample: The Bureau of Meteorology (BOM) measures the rainfall (in mm.) in July 2013 for 20 Sydney suburbs:

24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

Organize this rainfall data

Page 53: Week 2 slides_15 (FINANCE1)

BUSS1020 53

FREQUENCY DISTRIBUTION EXAMPLESort raw data in ascending order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Find range: 58 - 12 = 46

Select number of classes: 5

Compute class interval (width): 10 (46/5 then round up)

Determine class boundaries (limits): Class 1: 10 - 20 Class 2: 20 - 30 Class 3: 30 - 40 Class 4: 40 - 50 Class 5: 50 - 60

Compute class midpoints: 15, 25, 35, 45, 55

Count observations & assign to classes

Page 54: Week 2 slides_15 (FINANCE1)

54

ORGANIZING NUMERICAL DATA: FREQUENCY DISTRIBUTION EXAMPLE

Class MidpointsFrequency

10 - 20 15 3 20 - 30 25 6 30 - 40 35 5 40 - 50 45 4 50 - 60 55 2 Total 20

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

BUSS1020

Page 55: Week 2 slides_15 (FINANCE1)

55

RELATIVE & PERCENT FREQUENCY DISTRIBUTION EXAMPLE

Class Frequency

10 - 20 3 .15 15 20 - 30 6 .30 30 30 - 40 5 .25 25 40 - 50 4 .20 20 50 - 60 2 .10 10 Total 20 1.00 100

RelativeFrequency

Percentage

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

BUSS1020

Page 56: Week 2 slides_15 (FINANCE1)

56

ORGANIZING NUMERICAL DATA: CUMULATIVE FREQUENCY DISTRIBUTION EXAMPLE

Class

10 ≤ X< 20 3 15% 3 15%

20 ≤ X< 30 6 30% 9 45%

30 ≤ X< 40 5 25% 14 70%

40 ≤ X< 50 4 20% 18 90%

50 ≤ X< 60 2 10% 20 100%

Total 20 100 20 100%

Percentage

Cumulative

Percentage

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Frequency

Cumulative

Frequency

BUSS1020

Page 57: Week 2 slides_15 (FINANCE1)

BUSS1020 57

WHY USE A FREQUENCY DISTRIBUTION?Condenses raw data into a more useful form

Allows a quick visual interpretation of the data

Enables determination of the major characteristics of the data, including where the data are concentrated / clustered

Page 58: Week 2 slides_15 (FINANCE1)

BUSS1020 58

FREQUENCY DISTRIBUTIONS: SOME TIPSDifferent class boundaries may provide different pictures for the same data (especially for smaller data sets)

Shifts in data concentration may show up when different class boundaries are chosen

As the size of the data set increases, the impact of alterations in the selection of class boundaries is greatly reduced

When comparing two or more data sets with different sample sizes, you should use either a relative frequency or a percentage distribution

Page 59: Week 2 slides_15 (FINANCE1)

BUSS1020 59

AGENDA

Introduction

Categorical Data Organising One Variable Categorical Data Visualising One Variable Categorical Data Organising Two Variable Categorical Data Visualising Two Variable Categorical Data

Numerical Data Organising Numerical DataVisualising One Variable Numerical Data Visualising Two Variable Numerical Data

Principles of Graphical Excellence

DCOVADCOVADCOVADCOVA

DCOVADCOVADCOVA

DCOVA

Page 60: Week 2 slides_15 (FINANCE1)

60

VISUALIZING NUMERICAL DATA BY USING GRAPHICAL DISPLAYS

Numerical Data

Ordered Array

Stem-and-LeafDisplay Histogram Polygon Ogive

Frequency Distributions and

Cumulative Distributions

DCOVA

Also

(not in this unit)

Bar chart (discrete data)

Boxplot

BUSS1020

Chap 2-60

Page 61: Week 2 slides_15 (FINANCE1)

BUSS1020 61

HANS ROSLING

Data Visualisation http://www.youtube.com/watch?v=jbkSRLYSojo

Page 62: Week 2 slides_15 (FINANCE1)

BUSS1020 62

ORGANIZING NUMERICAL DATA: HISTOGRAMA histogram organizes data into groups (bins); bin size reflects the percentage of data points in each group.

Example:

The Bureau of Meteorology (BOM) measures the rainfall (in mm.) in July 2011 for 20 Sydney suburbs:

24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

Chap 2-62

Page 63: Week 2 slides_15 (FINANCE1)

63

0

2

4

6

8

5 15 25 35 45 55 MoreF

req

ue

nc

y

Histogram: July 2011 rainfall

VISUALIZING NUMERICAL DATA: THE HISTOGRAM

Class Frequency

10 - 20 3 .15 15 20 - 30 6 .30 30 30 - 40 5 .25 25 40 - 50 4 .20 20 50 - 60 2 .10 10 Total 20 1.00 100

RelativeFrequency

Percentage

(In a percentage histogram the vertical axis would be defined to show the percentage of observations per class)

BUSS1020

Chap 2-63

Page 64: Week 2 slides_15 (FINANCE1)

BUSS1020 64

VISUALIZING NUMERICAL DATA: THE HISTOGRAMA vertical bar chart of the data in a frequency distribution is called a histogram.

In a histogram there are no gaps between adjacent bars for continuous data. There may be gaps for discrete data.

The class boundaries (or class midpoints) are shown on the horizontal axis.

The vertical axis is either frequency, relative frequency, or percentage.

The height of the bars represent the frequency, relative frequency, or percentage when considering identical width bins (intervals, class width).

Chap 2-64

Page 65: Week 2 slides_15 (FINANCE1)

BUSS1020 65

VISUALIZING NUMERICAL DATA: THE POLYGONA percentage polygon is formed by having the midpoint of each class represent the data in that class and then connecting the sequence of midpoints at their respective class percentages.

The cumulative percentage polygon, or ogive, displays the variable of interest along the X axis, and the cumulative percentages along the Y axis.

Useful when there are two or more groups to compare.

Chap 2-65

Page 66: Week 2 slides_15 (FINANCE1)

BUSS1020 66

Chap 2 pg 63

0

2

4

6

8

5 15 25 35 45 55 65

Freq

uenc

y

Frequency Polygon: July 2011 rainfall

VISUALIZING NUMERICAL DATA: THE FREQUENCY POLYGON

Class Midpoints of Rainfall

Class

10 - 20 15 3 20 - 30 25 6 30 - 40 35 5 40 - 50 45 4 50 - 60 55 2

FrequencyClass

Midpoint

(In a percentage polygon the vertical axis would be defined to show the percentage of observations per class)

Page 67: Week 2 slides_15 (FINANCE1)

BUSS1020 67

VISUALIZING NUMERICAL DATA: THE OGIVE (CUMULATIVE % POLYGON)

0

50

100

10 20 30 40 50 60

Cum

ulat

ive

Perc

enta

ge

Ogive: July 2011 rainfall

Class

10 - 20 10 0

20 - 30 20 15

30 - 40 30 45

40 - 50 40 70

50 - 60 50 90

60 - 70 60 100

% lessthan lowerboundary

Lower class

boundary

Lower Class Boundary

The percentage of observations less than each lower class boundary are plot, versus the lower class boundaries.

Chap 2 pg 64

Page 68: Week 2 slides_15 (FINANCE1)

BUSS1020 68

AGENDA

Introduction

Categorical Data Organising One Variable Categorical Data Visualising One Variable Categorical Data Organising Two Variable Categorical Data Visualising Two Variable Categorical Data

Numerical Data Organising Numerical Data Visualising One Variable Numerical DataVisualising Two Variable Numerical Data

Principles of Graphical Excellence

DCOVADCOVADCOVADCOVA

DCOVADCOVADCOVA

DCOVA

Page 69: Week 2 slides_15 (FINANCE1)

BUSS1020 69

Chap 2 pg 67VISUALIZING TWO NUMERICAL VARIABLES: THE SCATTER PLOTData consisting of paired observations taken from two numerical variables

One variable measured on the vertical axis; other variable measured on the horizontal axis

Scatter plots are used to examine possible relationships between two numerical variables

Page 70: Week 2 slides_15 (FINANCE1)

70

SCATTER PLOT EXAMPLE

Volume per day

Cost per day

23 125

26 140

29 146

33 160

38 167

42 170

50 188

55 195

60 200

BUSS1020

Cost per Day vs. Production Volume

0

50

100

150

200

250

20 30 40 50 60 70

Volume per Day

Co

st p

er D

ay

Page 71: Week 2 slides_15 (FINANCE1)

71

CEO compensation vs company stock return

CEO-compensation.txt Ex 2.89, page 72, Berenson.

BUSS1020

Page 72: Week 2 slides_15 (FINANCE1)

72

CEO compensation vs company stock return

CEO-compensation.txt Ex 2.89, page 72, Berenson.

BUSS1020

Page 73: Week 2 slides_15 (FINANCE1)

73

CEO compensation vs company stock return

CEO-compensation.txt Ex 2.89, page 72, Berenson.

BUSS1020

Page 74: Week 2 slides_15 (FINANCE1)

74

Sales vs Newspaper advertising ($000)

BUSS1020

Page 75: Week 2 slides_15 (FINANCE1)

BUSS1020 75

Chap 2 pg 67VISUALIZING TWO NUMERICAL VARIABLES: THE TIME-SERIES PLOTTime-series plots are used to study patterns in the values of a numeric variable over time.

The numeric variable is measured on the vertical axis and the time period is measured on the horizontal axis.

Frequency of observations is often on an issue.

Page 76: Week 2 slides_15 (FINANCE1)

BUSS1020 76

TIME SERIES PLOT EXAMPLE

Number of Franchises, 1996-2004

0

20

40

60

80

100

120

1994 1996 1998 2000 2002 2004 2006

Year

Numb

er o

f Fr

anch

ises

Year

Number of

Franchises

1996 43

1997 54

1998 60

1999 73

2000 82

2001 95

2002 107

2003 99

2004 95Here frequency is annual

Page 77: Week 2 slides_15 (FINANCE1)

BUSS1020 77

DAILY AORD INDEX: 1995-2013

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50001000

2000

3000

4000

5000

6000

7000

Number of days

Page 80: Week 2 slides_15 (FINANCE1)

BUSS1020 80

AGENDA

Introduction

Categorical Data Organising One Variable Categorical Data Visualising One Variable Categorical Data Organising Two Variable Categorical Data Visualising Two Variable Categorical Data

Numerical Data Organising Numerical Data Visualising One Variable Numerical Data Visualising Two Variable Numerical Data

Principles of Graphical Excellence

DCOVADCOVADCOVADCOVA

DCOVADCOVADCOVA

DCOVA

Page 81: Week 2 slides_15 (FINANCE1)

BUSS1020 81

Chap 2 pg 76PRINCIPLES OF GRAPHICAL EXCELLENCEEvery graph should:not distort the data (story). not contain too much “chart junk” or “noise”. have all axes properly and clearly labelled. contain an informative title. be the simplest possible that tells the story. contain the source of the data. objectively and clearly convey the message or “story” in the data.

Further: 3D graphs should have a meaningful 3rd dimension. Usually 2D is sufficient.

The scale on the vertical axis should (usually) begin at zero.

Page 82: Week 2 slides_15 (FINANCE1)

82

http://www.perceptualedge.com/example14.php

Which plot is better?

‘83 ’93 ’03 ‘04 ‘05BUSS1020

Page 83: Week 2 slides_15 (FINANCE1)

BUSS1020 83

GRAPHICAL ERRORS: CHART JUNK, DISTORTION

1960: $1.00

1970: $1.60

1980: $3.10

1990: $3.80

Minimum Wage

Bad Presentation

Minimum Wage in the US

0

2

4

1960 1970 1980 1990

$

Good Presentation

Page 84: Week 2 slides_15 (FINANCE1)

84

http://www.perceptualedge.com/example18.php

BUSS1020

Page 85: Week 2 slides_15 (FINANCE1)

BUSS1020 85

GRAPHICAL ERRORS: NO RELATIVE BASIS

HD’s received by students.

HD’s received by students.

Bad Presentation

0

200

300

Y1 Y2 Y3 HO

Freq.

10%

30%

Y1 Y2 Y3 HO

Y1=First Year, Y2=Second Year, Y3=Third Year, HO=Honours

Source: University of Sydney Business School

100

20%

0%

%

Good Presentation

Page 86: Week 2 slides_15 (FINANCE1)

86

GRAPHICAL ERRORS: COMPRESSING THE VERTICAL AXIS

Good Presentation

Quarterly Sales Quarterly Sales

Bad Presentation

0

25

50

Q1 Q2 Q3 Q4

$

0

100

200

Q1 Q2 Q3 Q4

$

BUSS1020

Page 87: Week 2 slides_15 (FINANCE1)

BUSS1020 87

SUMMARYIntroduction

Categorical Data Organising One Variable Categorical Data: Summary Tables

Visualising One Variable Categorical Data: Bar Charts, Pie Charts, Pareto Charts

Organising Two Variable Categorical Data Contingency Tables

Visualising Two Variable Categorical Data Side-By-Side Charts, Time Series, Spatial Charts, 3-D Charts

Numerical Data Organising Numerical Data

Ordered Arrays, Frequency Distributions, Cumulative Distributions

Visualising One Variable Numerical Data Histograms, Polygons, Bar Charts, Ogives

Visualising Two Variable Numerical Data Scatter Plots, Time Series

Principles of Graphical Excellence