Exploratory Data Analysis - Checking For Normality

76
©[email protected] 2012 Explore & Summarise Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163

description

Exploratory Data Analysis - Checking For Normality

Transcript of Exploratory Data Analysis - Checking For Normality

Page 1: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Explore & Summarise

Dr Azmi Mohd Tamil

Dept of Community Health

Universiti Kebangsaan Malaysia

FK6163

Page 2: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Introduction

Method of Exploring and Summarising Data differs

According to Types of Variables

Page 3: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Dependent/Independent

Frequency of Exercise

Obesity

Food Intake

Independent Variables

Dependent Variable

Page 4: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Page 5: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Explore

4 It is the first step in the analytic process

4 to explore the characteristics of the data

4 to screen for errors and correct them

4 to look for distribution patterns - normal

distribution or not

4May require transformation before further

analysis using parametric methods

4Or may need analysis using non-parametric

techniques

Page 6: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Data Screening

4 By running

frequencies, we may

detect inappropriate

responses

4How many in the

audience have 15

children and

currently pregnant

with the 16th?

PARITY

67 30.7

44 20.2

36 16.5

22 10.1

21 9.6

8 3.7

3 1.4

7 3.2

5 2.3

3 1.4

1 .5

1 .5

218 100.0

1

2

3

4

5

6

7

8

9

10

11

15

Total

Valid

Frequency Percent

Page 7: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Data Screening

4 See whether the

data make sense or

not.

4 E.g. Parity 10 but

age only 25.

Page 8: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Page 9: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Page 10: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Data Screening

4 By looking at measures of central tendency

and range, we can also detect abnormal values

for quantitative data

Descriptive Statistics

184 32 484 53.05 33.37

184

Pre-pregnancy weight

Valid N (listwise)

N Minimum Maximum Mean

Std.

Deviation

Page 11: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Interpreting the Box Plot

Outlier

Outlier

Upper quartile

Smallest non-outlier

Median

Lower quartile

Largest non-outlier The whiskers extend

to 1.5 times the box

width from both ends

of the box and ends

at an observed value.

Three times the box

width marks the

boundary between

"mild" and "extreme"

outliers.

"mild" = closed dots

"extreme"= open dots

Page 12: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Data Screening

4We can

also make

use of

graphical

tools such

as the box

plot to

detect

wrong

data entry 184N =

Pre-pregnancy weight

600

500

400

300

200

100

0

141198211181

73

Page 13: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Data Cleaning

4 Identify the extreme/wrong values

4Check with original data source – i.e.

questionnaire

4 If incorrect, do the necessary correction.

4Correction must be done before

transformation, recoding and analysis.

Page 14: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Parameters of Data Distribution

4Mean – central value of data

4Standard deviation – measure of how

the data scatter around the mean

4Symmetry (skewness) – the degree of

the data pile up on one side of the mean

4Kurtosis – how far data scatter from the

mean

Page 15: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Normal distribution

4 The Normal distribution is

represented by a family of curves

defined uniquely by two parameters,

which are the mean and the

standard deviation of the population.

4 The curves are always

symmetrically bell shaped, but the

extent to which the bell is

compressed or flattened out

depends on the standard deviation

of the population.

4 However, the mere fact that a curve

is bell shaped does not mean that it

represents a Normal distribution,

because other distributions may

have a similar sort of shape.

Page 16: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Normal distribution

4 If the observations follow a

Normal distribution, a range

covered by one standard

deviation above the mean

and one standard deviation

below it includes about

68.3% of the observations;

4 a range of two standard

deviations above and two

below (+ 2sd) about 95.4%

of the observations; and

4 of three standard deviations

above and three below (+

3sd) about 99.7% of the

observations

68.3%

95.4%

99.7%

Page 17: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Normality

4Why bother with normality??

4Because it dictates the type of analysis

that you can run on the data

Page 18: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Normality-Why?Parametric

Qualitative

Dichotomus

Quantitative Normally distributed data Student's t Test

Qualitative

Polinomial

Quantitative Normally distributed data ANOVA

Quantitative Quantitative Repeated measurement of the

same individual & item (e.g.

Hb level before & after

treatment). Normally

distributed data

Paired t Test

Quantitative -

continous

Quantitative -

continous

Normally distributed data Pearson Correlation

& Linear

Regresssion

Page 19: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Normality-Why?Non-parametric

Qualitative

Dichotomus

Quantitative Data not normally distributed Wilcoxon Rank Sum

Test or U Mann-

Whitney Test

Qualitative

Polinomial

Quantitative Data not normally distributed Kruskal-Wallis One

Way ANOVA Test

Quantitative Quantitative Repeated measurement of the

same individual & item

Wilcoxon Rank Sign

Test

Quantitative -

continous/ordina

l

Quantitative -

continous

Data not normally distributed Spearman/Kendall

Rank Correlation

Page 20: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Normality-How?

4 Explored graphically

• Histogram

• Stem & Leaf

• Box plot

• Normal probability

plot

• Detrended normal

plot

4 Explored statistically

• Kolmogorov-Smirnov

statistic, with

Lilliefors significance

level and the

Shapiro-Wilks

statistic

• Skew ness (0)

• Kurtosis (0)

– + leptokurtic

– 0 mesokurtik

– - platykurtic

Page 21: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Kolmogorov- Smirnov

4 In the 1930’s, Andrei Nikolaevich

Kolmogorov (1903-1987) and N.V.

Smirnov (his student) came out with the

approach for comparison of distributions

that did not make use of parameters.

4This is known as the Kolmogorov-

Smirnov test.

Page 22: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Skew ness

4 Skewed to the right

indicates the

presence of large

extreme values

4 Skewed to the left

indicates the

presence of small

extreme values

Page 23: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Kurtosis

4 For symmetrical

distribution only.

4Describes the shape

of the curve

4Mesokurtic -

average shaped

4 Leptokurtic - narrow

& slim

4 Platikurtic - flat &

wide

Page 24: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Skew ness & Kurtosis

4 Skew ness ranges from -3 to 3.

4 Acceptable range for normality is skew ness

lying between -1 to 1.

4Normality should not be based on skew ness

alone; the kurtosis measures the “peak ness”

of the bell-curve (see Fig. 4).

4 Likewise, acceptable range for normality is

kurtosis lying between -1 to 1.

Page 25: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Page 26: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Normality - ExamplesGraphically

Height

167.5

165.0

162.5

160.0

157.5

155.0

152.5

150.0

147.5

145.0

142.5

140.0

60

50

40

30

20

10

0

Std. Dev = 5.26

Mean = 151.6

N = 218.00

Page 27: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Q&Q Plot

4 This plot compares the quintiles of a data distribution with the quintiles of a standardised theoretical distribution from a specified family of distributions (in this case, the normal distribution).

4 If the distributional shapes differ, then the points will plot along a curve instead of a line.

4 Take note that the interest here is the central portion of the line, severe deviations means non-normality. Deviations at the “ends” of the curve signifies the existence of outliers.

Page 28: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Normality - ExamplesGraphically

Detrended Normal Q-Q Plot of Height

Observed Value

170160150140130

Dev from Normal

.6

.5

.4

.3

.2

.1

0.0

-.1

-.2

Normal Q-Q Plot of Height

Observed Value

170160150140130

Expected Normal

3

2

1

0

-1

-2

-3

Page 29: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Normal distribution

Mean=median=mode

Page 30: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Tests of Normality

.060 218 .052Height

Statistic df Sig.

Kolmogorov-Smirnova

Lilliefors Significance Correctiona.

Descriptives

151.65 .356

150.94

152.35

151.59

151.50

27.649

5.258

139

168

29

8.00

.148 .165

.061 .328

Mean

Lower Bound

Upper Bound

95% Confidence

Interval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

Height

Statistic Std. Error

Normality - ExamplesStatistically

Shapiro-Wilks; only if

sample size less than 100.

Normal distribution

Mean=median=mode

Skewness & kurtosis

within +1

p > 0.05, so normal

distribution

Page 31: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

K-S Test

Page 32: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

K-S Test

4very sensitive to the sample sizes of the

data.

4For small samples (n<20, say), the

likelihood of getting p<0.05 is low

4 for large samples (n>100), a slight

deviation from normality will result in

being reported as abnormal distribution

Page 33: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Guide to deciding on normality

Page 34: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Normality Transformation

Normal Q-Q Plot of LN_PARIT

Observed Value

3.02.52.01.51.0.50.0-.5

Expected Normal

3

2

1

0

-1

-2

Normal Q-Q Plot of LN_PARIT

Observed Value

3.02.52.01.51.0.50.0-.5

Expected Normal

3

2

1

0

-1

-2

Normal Q-Q Plot of PARITY

Observed Value

1614121086420

Expected Normal

3

2

1

0

-1

-2

Normal Q-Q Plot of PARITY

Observed Value

1614121086420

Expected Normal

3

2

1

0

-1

-2

Page 35: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Square root Logarithm Inverse

Reflect and square root

Reflect and logarithm Reflect and inverse

TYPES OF TRANSFORMATIONS

Page 36: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Summarise

4 Summarise a large set of data by a few meaningful numbers.

4 Single variable analysis• For the purpose of describing the data

• Example; in one year, what kind of cases are treated by the Psychiatric Dept?

• Tables & diagrams are usually used to describe the data

• For numerical data, measures of central tendency & spread is usually used

Page 37: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Frequency Table

Race F %

Malay 760 95.84%

Chinese 5 0.63%

Indian 0 0.00%

Others 28 3.53%

TOTAL 793 100.00%

•Illustrates the frequency observed for each category

Page 38: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Frequency Distribution Table

Umur Bil %

0-0.99 25 3.26%

1-4.99 78 10.18%

5-14.99 140 18.28%

15-24.99 126 16.45%

25-34.99 112 14.62%

35-44.99 90 11.75%

45-54.99 66 8.62%

55-64.99 60 7.83%

65-74.99 50 6.53%

75-84.99 16 2.09%

85+ 3 0.39%

JUMLAH 766

• > 20 observations, best

presented as a frequency

distribution table.

•Columns divided into class &

frequency.

•Mod class can be determined

using such tables.

Page 39: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Measurement of Central

Tendency & Spread

Page 40: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Measures of Central Tendency

4Mean

4Mode

4Median

Page 41: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Measures of Variability

4Standard deviation

4Inter-quartiles

4Skew ness & kurtosis

Page 42: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Mean

4 the average of the data collected

4To calculate the mean, add up the

observed values and divide by the

number of them.

4A major disadvantage of the mean is

that it is sensitive to outlying points

Page 43: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Mean: Example

412, 13, 17, 21, 24, 24, 26, 27, 27,

30, 32, 35, 37, 38, 41, 43, 44, 46,

53, 58

4Total of x = 648

4n= 20

4Mean = 648/20 = 32.4

Page 44: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Measures of variation -standard deviation

4 tells us how much all the scores in a dataset cluster around the

mean. A large S.D. is indicative of a more varied data scores.

4 a summary measure of the differences of each observation from

the mean.

4 If the differences themselves were added up, the positive would

exactly balance the negative and so their sum would be zero.

4 Consequently the squares of the differences are added.

Page 45: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Page 46: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

sd: Example

4 12, 13, 17, 21, 24, 24,

26, 27, 27, 30, 32, 35,

37, 38, 41, 43, 44, 46,

53, 58

4 Mean = 32.4; n = 20

4 Total of (x-mean)2

= 3050.8

4 Variance = 3050.8/19

= 160.5684

4 sd = 160.56840.5=12.671645TOTAL1405.8TOTAL

655.36585.7630

424.365329.1627

184.964629.1627

134.564440.9626

112.364370.5624

73.964170.5624

31.3638129.9621

21.1637237.1617

6.7635376.3613

0.1632416.1612

(x-mean)^2x(x-mean)^2x

Page 47: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Median

4 the ranked value that lies in the middle of the data

4 the point which has the property that half the data are greater than it, and half the data are less than it.

4 if n is even, average the n/2th largest and the n/2 + 1th largest observations

4"robust" to outliers

Page 48: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Median:

4 12, 13, 17, 21, 24, 24, 26, 27, 27, 30,

32, 35, 37, 38, 41, 43, 44, 46, 53, 58

4 (20+1)/2 = 10th which is 30, 11th is 32

4 Therefore median is (30 + 32)/2 = 31

Page 49: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Measures of variation -quartiles

4The range is very susceptible to what

are known as outliers

4A more robust approach is to divide the

distribution of the data into four, and find

the points below which are 25%, 50%

and 75% of the distribution. These are

known as quartiles, and the median is

the second quartile.

Page 50: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Quartiles

4 12, 13, 17, 21, 24,

24, 26, 27, 27, 30,

32, 35, 37, 38, 41,

43, 44, 46, 53, 58

4 25th percentile 24; (24+24)/2

4 50th percentile 31; (30+32)/2 ; = median

4 75th percentile 42.5; (41+43)/2

Page 51: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Mode

4The most frequent occurring number.

E.g. 3, 13, 13, 20, 22, 25: mode = 13.

4 It is usually more informative to quote

the mode accompanied by the

percentage of times it happened; e.g.,

the mode is 13 with 33% of the

occurrences.

Page 52: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Mode: Example

412, 13, 17, 21, 24, 24, 26, 27, 27, 30,

32, 35, 37, 38, 41, 43, 44, 46, 53, 58

4Modes are 24 (10%) & 27 (10%)

Page 53: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Mean or Median?

4Which measure of central tendency

should we use?

4 if the distribution is normal, the mean+sd

will be the measure to be presented,

otherwise the median+IQR should be

more appropriate.

Page 54: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Normal distribution;

Use Mean+SD

Not Normal distribution;

Use Median & IQR

Page 55: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Presentation

Qualitative & Quantitative Data

Charts & Tables

Page 56: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Presentation

Qualitative Data

Page 57: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Graphing Categorical Data:

Univariate Data

Categorical Data

Tabulating Data

The Summary Table

0 10 20 30 40 50

Stocks

Bonds

Savings

CD

Graphing Data

Pie Charts

Pareto DiagramBar Charts

0

5

10

15

20

25

30

35

40

45

Stocks Bonds Savings CD

0

20

40

60

80

100

120

Page 58: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Bar Chart

Type of work

Field w orkOffice w orkHousew ife

Percent

80

60

40

20

0

20

11

69

Page 59: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Pie Chart

Others

Chinese

Malay

Page 60: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Tabulating and Graphing

Bivariate Categorical Data

4Contingency tables:

Table 1: Contigency table of pregnancy induced hypertension and

SGA

Count

103 94 197

5 16 21

108 110 218

No

Yes

Pregnancy induced

hypertension

Total

Normal SGA

SGA

Total

Page 61: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Tabulating and Graphing Bivariate Categorical Data

Pregnancy induced hypertension

YesNo

Count

120

100

80

60

40

20

0

SGA

Normal

SGA

16

94

103

4 Side

by

side

charts

Page 62: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Presentation

Quantitative Data

Page 63: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Ogive

0

20

40

60

80

100

120

10 20 30 40 50 60

Tabulating and Graphing

Numerical Data

0

1

2

3

4

5

6

7

10 20 30 40 50 60

Numerical Data

Ordered Array

Stem and Leaf

Display

Histograms Area

Tables

2 144677

3 028

4 1

41, 24, 32, 26, 27, 27, 30, 24, 38, 21

21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Frequency Distributions

Cumulative Distributions

Polygons

Page 64: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Tabulating Numerical Data:

Frequency Distributions

4 Sort raw data in ascending order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

4 Find range: 58 - 12 = 46

4 Select number of classes: 5 (usually between 5 and 15)

4Compute class interval (width): 10 (46/5 then round up)

4Determine class boundaries (limits): 10, 20, 30, 40, 50, 60

4Compute class midpoints: 14.95, 24.95, 34.95, 44.95, 54.95

4Count observations & assign to classes

Page 65: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Frequency Distributions

and Percentage Distributions

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

100%20TOTAL

10%254.9550.0 - 59.9

20%444.9540.0 - 49.9

25%534.9530.0 - 39.9

30%624.9520.0 - 29.9

15%314.9510.0 - 19.9

%FreqMidpointClass

Page 66: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

3

6

5

4

2

0

1

2

3

4

5

6

7

14.95 24.95 34.95 44.95 54.95

Age

Frequency

Graphing Numerical Data:

The Histogram

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Class MidpointsClass Boundaries

No Gaps

Between

Bars

Page 67: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Graphing Numerical Data:

The Frequency Polygon

Class Midpoints

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

0

1

2

3

4

5

6

7

14.95 24.95 34.95 44.95 54.95

Page 68: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Calculate Measures of Central Tendency & Spread

4We can use frequency distribution table

to calculate;

• Mean

• Standard Deviation

• Median

• Mode

Page 69: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Mean

4Mean = 659/20

= 32.95

4Compare with 32.4

from direct

calculation.

.f mpX

n=∑

659.0020TOTAL

109.90254.9550.0 - 59.9

179.80444.9540.0 - 49.9

174.75534.9530.0 - 39.9

149.70624.9520.0 - 29.9

44.85314.9510.0 - 19.9

freq x m.p.FreqMidpointClass

Page 70: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Standard deviation

s2=((24634.05-(6592/20))/19)

s2=2920.05/19

s2=153.69

s = 12.4

4 Compare with 12.67 from

direct measurement.

( )2

2.

.

1

f mpf mp

nsn

−=

∑∑

24634.05659.0020TOTAL

6039.01109.90254.9550.0 - 59.9

8082.01179.80444.9540.0 - 49.9

6107.51174.75534.9530.0 - 39.9

3735.02149.70624.9520.0 - 29.9

670.5144.85314.9510.0 - 19.9

f.mp^2f.m.p.Freq

Mid

PointClass

Page 71: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Median

20TOTAL

250.0 - 59.9

440.0 - 49.9

median class530.0 - 39.9

620.0 - 29.9

310.0 - 19.9

FreqClass 4 L1 +i *((n+1)/2) – f1

fmed

4 f1 = cumulative freq

above median class

4 29.95 + 10((21/2)-9)

5

4 29.95 + 15/5 = 32.95

4 From direct calculation,

median = 31

Page 72: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Mode

=L1 +i *(Diff1/(Diff1+Diff2))

=19.95 + 10(3/(3+1))

=27.45

4Compare with

modes of 24 & 27

from direct

calculation.20TOTAL

250.0 - 59.9

440.0 - 49.9

530.0 - 39.9

mode class620.0 - 29.9

310.0 - 19.9

FreqClass

Page 73: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Graphing Bivariate Numerical

Data (Scatter Plot)

Page 74: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Linear Regression Line

Page 75: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Survival Function

DURATION

76543210

Cum Survival

1.2

1.0

.8

.6

.4

.2

0.0

Survival Function

Censored

Page 76: Exploratory Data Analysis - Checking For Normality

©[email protected] 2012

Principles of Graphical Excellence

4Presents data in a way that provides

substance, statistics and design

4Communicates complex ideas with clarity,

precision and efficiency

4Gives the largest number of ideas in the

most efficient manner

4Almost always involves several

dimensions

4Tells the truth about the data