Research Methods & Design in Psychology Lecture 3 Descriptives & Graphing Lecturer: James Neill.

Post on 18-Dec-2015

216 views 0 download

Tags:

Transcript of Research Methods & Design in Psychology Lecture 3 Descriptives & Graphing Lecturer: James Neill.

Research Methods & Design in Psychology

Lecture 3Descriptives &

Graphing

Lecturer: James Neill

Overview

• Univariate descriptives & graphs

• Non-parametric vs. parametric• Non-normal distributions• Properties of normal

distributions• Graphing relations b/w 2 and 3

variables

Empirical Approach to ResearchA positivistic approach ASSUMES:• the world is made up of bits of data which can

be ‘measured’, ‘recorded’, & ‘analysed’

• Interpretation of data can lead to valid insights about how people think, feel and behave

What do we want to Describe?

Distributional properties of variables:

• Central tendency(ies)

• Shape

• Spread / Dispersion

Basic Univariate Descriptive Statistics

Central tendency

• Mode

• Median

• Mean

Spread

• Interquartile Range

• Range

• Standard Deviation

• VarianceShape

• Skewness

• Kurtosis

Basic Univariate Graphs

• Bar Graph – Pie Chart• Stem & Leaf Plot• Boxplot• Histogram

Measures of Central Tendency

• Statistics to represent the ‘centre’ of a distribution– Mode (most frequent)– Median (50th percentile)– Mean (average)

• Choice of measure dependent on– Type of data– Shape of distribution (esp. skewness)

Measures of Central Tendency

XXX?Ratio

XXXInterval

XXOrdinal

XNominal

MeanMedianMode

Measures of Dispersion

• Measures of deviation from the central tendency

• Non-parametric / non-normal:range, percentiles, min, max

• Parametric:SD & properties of the normal distribution

Measures of Dispersion

XXXRatio

X?XXInterval

XOrdinal

Nominal

SDPercentiles

Range, Min/Max

Describing Nominal Data

• Frequencies– Most frequent?– Least frequent?– Percentages?

• Bar graphs– Examine comparative heights of bars

– shape is arbitrary• Consider whether to use freqs or

%s

Frequencies

• Number of individuals obtaining each score on a variable

• Frequency tables• graphically (bar chart, pie chart)• Can also present as %

Frequency table for sex

SEX

14 70.0 70.0 70.0

6 30.0 30.0 100.0

20 100.0 100.0

female

male

Total

ValidFrequency Percent Valid Percent

CumulativePercent

Bar chart for frequency by sex

SEX

SEX

malefemale

Fre

qu

en

cy

16

14

12

10

8

6

4

2

0

Pie chart for frequency by sex

SEX

male

female

Bar chart: Do you believe in God?

YesSort ofNo

Do you believe in God?

60

50

40

30

20

10

0

Cou

nt

Bar chart for cost by state

Bar chart vs. Radar Chart

Time Management

Social Competence

Achievement Motivation

Intellectual Flexibility

Task Leadership

Emotional Control

Active Initiative

Self Confidence

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Bar Chart of Sorted Factor Effect Sizes Time 1 to 2

Factors

Eff

ect

size

Bar chart vs. Radar Chart

Time Management

Social Competence

Achievement Motivation

Intellectual Flexibility

Task Leadership

Emotional Control

Active Initiative

Self Confidence

0.60

0.40

0.20

0.00

Radar Chart of Factor Effect Sizes Time 1 to 2

Mode

• Most common score - highest point in a distribution

• Suitable for all types of data including nominal (may not be useful for ratio)

• Before using, check frequencies and bar graph to see whether it is an accurate and useful statistic.

Describing Ordinal Data

• Conveys order but not distance (e.g., ranks)

• Descriptives as for nominal (i.e., frequencies, mode)

• Also maybe median – if accurate/useful• Maybe IQR, min. & max.• Bar graphs, pie charts, & stem-&-leaf

plots

Stem & Leaf Plot

• Useful for ordinal, interval and ratio data• Alternative to histogram

Box & whisker

• Useful for interval and ratio data

• Represents min. max, median and quartiles

Describing Interval Data

• Conveys order and distance, but no true zero (0 pt is arbitrary).

• Interval data is discrete, but is often treated as ratio/continuous (especially for > 5 intervals)

• Distribution (shape)• Central tendency (mode, median)• Dispersion (min, max, range)• Can also use M & SD if treating as

continuous

Describing Ratio Data

• Numbers convey order and distance, true zero point - can talk meaningfully about ratios.

• Continuous• Distribution (shape – skewness, kurtosis)• Central tendency (median, mean)• Dispersion (min, max, range, SD)

Univariate data plot for a ratio variable

The Four Moments of a Normal Distribution

Mean

<-SD-><-Skew Skew->

<-K

urt-

>

The Four Moments of a Normal Distribution

Four mathematical qualities (parameters) allow one to describe a continuous distribution which as least roughly follows a bell curve shape:

• 1st = mean (central tendency)• 2nd = SD (dispersion)• 3rd = skewness (lean / tail)• 4th = kurtosis (peakedness /

flattness)

Mean (1st moment )

• Average score• Mean = X / N• Use for ratio data or interval (if

treating it as continuous). • Influenced by extreme scores

(outliers)

Standard Deviation (2nd moment )

• SD = square root of Variance

= (X - X)2

N – 1• Standard Error (SE) = SD / square root

of N

Skewness (3rd moment )

• Lean of distribution• +ve = tail to right• -ve = tail to left• Can be caused by an outlier• Can be caused by ceiling or floor effects• Can be accurate

(e.g., the number of cars owned per person)

Skewness (3rd moment )

• Negative skew • Positive skew

Ceiling Effect

Floor Effect

Kurtosis (4th moment )

• Flatness or peakedness of distribution• +ve = peaked• -ve = flattened• Be aware that by altering the X and Y

axis, any distribution can be made to look more peaked or more flat – so add a normal curve to the histogram to help judge kurtosis

Kurtosis (4th moment )

Red = Positive (leptokurtic)

Blue = negative (platykurtic)

Key Areas under the Curve for Normal Distributions

• For normal distributions, approx. +/- 1 SD = 68%+/- 2 SD ~ 95%+/- 3 SD ~ 99.9%

Areas under the normal curve

Types of Non-normal Distribution

• Bi-modal• Multi-modal• Positively skewed• Negatively skewed• Flat (platykurtic)• Peaked (leptokurtic)

Non-normal distributions

Non-normal distributions

Rules of Thumb in Judging Severity of Skewness & Kurtosis

• View histogram with normal curve

• Deal with outliers• Skewness / kurtosis <-1 or >1• Skewness / kurtosis

significance tests

Histogram of weight

WEIGHT

110.0100.090.080.070.060.050.040.0

HistogramF

req

ue

ncy

8

6

4

2

0

Std. Dev = 17.10

Mean = 69.6

N = 20.00

Histogram of daily calorie intake

Histogram of fertility

Example ‘normal’ distribution 1

140120100806040200

Die

60

50

40

30

20

10

0

Fre

qu

ency

Mean =81.21Std. Dev. =18.228

N =188

Example ‘normal’ distribution 2

Very masculineFairly masculineAndrogynousFairly feminineVery feminine

Femininity-Masculinity

60

40

20

0

Cou

nt

Example ‘normal’ distribution 3

Very masculineFairly masculineAndrogynousFairly feminine

Femininity-Masculinity

50

40

30

20

10

0

Cou

ntGender: male

Example ‘normal’ distribution 4

Very masculineFairly masculineAndrogynousFairly feminineVery feminine

Femininity-Masculinity

60

40

20

0

Cou

ntGender: female

Example ‘normal’ distribution 5

250200150100500

Exercise (mins/day)

60

50

40

30

20

10

0

Fre

que

ncy

Skewed Distributions& the Mode, Median & Mean

• +vely skewed mode < median < mean

• Symmetrical (normal) mean = median = mode

• -vely skewed mean < median < mode

Effects of skew on measures of central tendency

More on Graphing

(Visualising Data)

Edward Tufte

Graphs: Reveal data Communicate complex ideas

with clarity, precision, and efficiency

Tufte's Guidelines 1

• Show the data• Substance rather than method• Avoid distortion• Present many numbers in a small space• Make large data sets coherent

Tufte's Guidelines 2

• Encourage eye to make comparisons• Reveal data at several levels• Purpose: Description, exploration,

tabulation, decoration• Closely integrated with statistical and

verbal descriptions

Tufte’s Graphical Integrity 1

• Some lapses intentional, some not • Lie Factor = size of effect in graph

size of effect in data• Misleading uses of area• Misleading uses of perspective• Leaving out important context• Lack of taste and aesthetics

Tufte's Graphical Integrity 2

• Trade-off between amount of information, simplicity, and accuracy

• “It is often hard to judge what users will find intuitive and how [a visualization] will support a particular task” (Tweedie et al)

Chart scale

Chart scale

Chart scale

Cleveland’s Hierarchy

Volume

Food Aid Received by Developing Countries

0

50

100

150

200

250

300

350

Burkin

a Fas

o

Ethiop

ia

Moz

ambi

que

Kenya

Mor

occo

Bangl

ades

hIn

dia

Pakist

anEgy

pt

$ m

illio

n in

foo

d ai

d (1

988)

Percentage of Doctors Devoted Solely to Family Practice in California 1964-1990

Distortive Variations in Scale

Distortive Variations in Scale

Restricted Scales

Restricted Scales

Example Graphs Depicting the Relationship between Two Variables (Bivariate)

People Histogram

Separate Graphs

Example Graphs Depicting the Relationship between

Three Variables (Multivariate)

Clustered bar chart

19th vs. 20th century causes of death

Demographic distribution of age

Where partners first met

Line graph

Line graph

Causes of Mortality

Bivariate Normality

Exampes of More Complex Graphs

Sea Temperature

Sea Temperature

Inferential Statistical Analaysis Decision Making

Tree

Links

• Presenting Data – Statistics Glossary v1.1 - http://www.cas.lancs.ac.uk/glossary_v1.1/presdata.html

• A Periodic Table of Visualisation Methods - http://www.visual-literacy.org/periodic_table/periodic_table.html

• Gallery of Data Visualization

• Univariate Data Analysis – The Best & Worst of Statistical Graphs - http://www.csulb.edu/~msaintg/ppa696/696uni.htm

• Pitfalls of Data Analysis – http://www.vims.edu/~david/pitfalls/pitfalls.htm

• Statistics for the Life Sciences –http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/Handouts.html