Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

20
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4

description

Describing Variability: the Range Simplest and most obvious way of describing variability Range =  Highest -  Lowest (real limits) The range only takes into account the two extreme scores and ignores any values in between. To counter this there the distribution is divided into quarters (quartiles). Q1 = 25%, Q2 =50%, Q3 =75%  The Interquartile range: the distance of the middle two quartiles (Q3 – Q1)  The Semi-Interquartile range: is one half of the Interquartile range

Transcript of Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Page 1: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Variability

Introduction to StatisticsChapter 4

Jan 22, 2009Class #4

Page 2: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Describing Variability Describes in an exact quantitative measure, how spread

out/clustered together the scores are Variability is usually defined in terms of distance

How far apart scores are from each other How far apart scores are from the mean How representative a score is of the data set as a whole

Page 3: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Describing Variability: the Range

Simplest and most obvious way of describing variability Range = Highest - Lowest(real limits)

The range only takes into account the two extreme scores and ignores any values in between. To counter this there the distribution is divided into quarters (quartiles). Q1 = 25%, Q2 =50%, Q3 =75%

The Interquartile range: the distance of the middle two quartiles (Q3 – Q1)

The Semi-Interquartile range: is one half of the Interquartile range

Page 4: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

The most common percentiles are quartiles. Quartiles divide data sets into fourths or four equal parts.

• The 1st quartile, denoted Q1, divides the bottom 25% the data from the top 75%. Therefore, the 1st quartile is equivalent to the 25th percentile.

• The 2nd quartile divides the bottom 50% of the data from the top 50% of the data, so that the 2nd quartile is equivalent to the 50th percentile, which is equivalent to the median.

• The 3rd quartile divides the bottom 75% of the data from the top 25% of the data, so that the 3rd quartile is equivalent to the 75th percentile.

Interquartile range (IQR)

Page 5: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Interquartile range (IQR) The interquartile range (IQR) is the distance

between the 75th percentile and the 25th percentile The IQR is essentially the range of the middle

50% of the data Because it uses the middle 50%, the IQR is not

affected by outliers (extreme values)

Page 6: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Interquartile range (IQR)

Example: Compute the interquartile range for the

sorted 18, 33, 58, 67, 73, 93, 147 The 25th and 75th percentiles are

the .25*(7+1) and .75*(7+1) = 2nd and 6th observations, respectively.

IQR = 93-33 = 60.

Page 7: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Describing Variability: Deviation in a Population A more sophisticated measure of variability is one

that shows how scores cluster around the mean Deviation is the distance of a score from the mean

X - , e.g. 11 - 6.35 = 3.65, 3 – 6.35 = -3.35

A measure representative of the variability of all the scores would be the mean of the deviation scores

(X - ) Add all the deviations and divide by n N However the deviation scores add up to zero (as mean

serves as balance point for scores)

Page 8: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Describing Variability: Variance in a Population

To remove the +/- signs we simply square each deviation before finding the average. This is called the Variance:

(X - )² = 106.55 = 5.33 N 20

The numerator is referred to as the Sum of Squares (SS): as it refers to the sum of the squared deviations around the mean value

SS is a basic component of variability – the sum of squared deviation scores

Page 9: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Variability: Variance in a Population

let X = [3, 4, 5 ,6, 7] Mean = 5 (X - Mean ) = [-2, -1, 0, 1, 2]

subtract Mean from each number in X (X - Mean )2 = [4, 1, 0, 1, 4]

squared deviations from the mean (X - Mean )2 = 10

sum of squared deviations from the mean (SS) (X - Mean )2 /N = 10/5 = 2

average squared deviation from the mean

NX

r

22 )(

Page 10: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Variability: Variance in a Population let X = [1, 3, 5, 7, 9] Mean = 5 (X - Mean) = [-4, -2, 0, 2, 4 ]

subtract Mean from each number in X (X - Mean)2 = [16, 4, 0, 4, 16]

squared deviations from the mean (X - Mean)2 = 40

sum of squared deviations from the mean (SS) (X - Mean)2 /n = 40/5 = 8

average squared deviation from the mean

NX

r

22 )(

Page 11: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Variance can be calculated with the sum of squares (SS) divided by n

Variability: Variance in a Population

NX

r

22 )(

Page 12: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Variability: Variance in a Sample

Variance in a sample

n is the number of scores -1SS is the Sum of Squared Deviations From the Mean

So, variance (S2) is the average squared deviation from the mean

SS (X X)2

1)( 2

2

n

XXS

Page 13: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Describing Variability: Population and Sample Variance

Population variance is designated by ² ² = (X - )² = SS N N

Sample Variance is designated by s² Samples are less variable than populations: they therefore give

biased estimates of population variability Degrees of Freedom (df): the number of independent (free to

vary) scores. In a sample, the sample mean must be known before the variance can be calculated, therefore the final score is dependent on earlier scores: df = n -1

s² = (x - M)² = SS = 106.55 = 5.61 n - 1 n -1 20 -1

Page 14: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Describing Variability: the Standard Deviation

Variance is a measure based on squared distances In order to get around this, we can take the square

root of the variance, which gives us the standard deviation

Population () and Sample (s) standard deviation

= (X - )² N

s = (X - M)² n - 1

Page 15: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Variability: Standard Deviation of a Sample The square root of Variance is called the

Standard Deviation

1)( 2

2

n

XXS

1)( 2

n

XXS

Variance

Standard Deviation

Page 16: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Variability: Standard Deviation “The Standard Deviation tells us

approximately how far the scores vary from the mean on average”

It is approximately the average deviation of scores from the mean

Page 17: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

The Standard Deviation and the Normal Distribution

There are known percentages of scores above or below any given point on a normal curve 34% of scores between the mean

and 1 SD above or below the mean

An additional 14% of scores between 1 and 2 SDs above or below the mean

Thus, about 96% of all scores are within 2 SDs of the mean (34% + 34% + 14% + 14% = 96%)

Note: 34% and 14% figures can be useful to remember

Pro

babi

lity

Den

sity

Page 18: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Describing Variability

The standard deviation is the most common measure of variability, but the others can be used. A good measure of variability must:

Must be stable and reliable: not be greatly affected by little details in the data

Extreme scores Multiple sampling from the same population Open-ended distributions

Both the variance and SD are related to other statistical techniques

Page 19: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

SS Computational Formula Note this formula on page 93. In later

chapters, we will be using this alternate SS formula.

Page 20: Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Credits http://www.le.ac.uk/pc/sk219/introtostats1.ppt#259,4,Plotting Data:

describing spread of data http://math.usask.ca/~miket/Sullivan_PP/Chapter_3/sec3_4.ppt#24