Post on 18-Jan-2018
description
Variability
Introduction to StatisticsChapter 4
Jan 22, 2009Class #4
Describing Variability Describes in an exact quantitative measure, how spread
out/clustered together the scores are Variability is usually defined in terms of distance
How far apart scores are from each other How far apart scores are from the mean How representative a score is of the data set as a whole
Describing Variability: the Range
Simplest and most obvious way of describing variability Range = Highest - Lowest(real limits)
The range only takes into account the two extreme scores and ignores any values in between. To counter this there the distribution is divided into quarters (quartiles). Q1 = 25%, Q2 =50%, Q3 =75%
The Interquartile range: the distance of the middle two quartiles (Q3 – Q1)
The Semi-Interquartile range: is one half of the Interquartile range
The most common percentiles are quartiles. Quartiles divide data sets into fourths or four equal parts.
• The 1st quartile, denoted Q1, divides the bottom 25% the data from the top 75%. Therefore, the 1st quartile is equivalent to the 25th percentile.
• The 2nd quartile divides the bottom 50% of the data from the top 50% of the data, so that the 2nd quartile is equivalent to the 50th percentile, which is equivalent to the median.
• The 3rd quartile divides the bottom 75% of the data from the top 25% of the data, so that the 3rd quartile is equivalent to the 75th percentile.
Interquartile range (IQR)
Interquartile range (IQR) The interquartile range (IQR) is the distance
between the 75th percentile and the 25th percentile The IQR is essentially the range of the middle
50% of the data Because it uses the middle 50%, the IQR is not
affected by outliers (extreme values)
Interquartile range (IQR)
Example: Compute the interquartile range for the
sorted 18, 33, 58, 67, 73, 93, 147 The 25th and 75th percentiles are
the .25*(7+1) and .75*(7+1) = 2nd and 6th observations, respectively.
IQR = 93-33 = 60.
Describing Variability: Deviation in a Population A more sophisticated measure of variability is one
that shows how scores cluster around the mean Deviation is the distance of a score from the mean
X - , e.g. 11 - 6.35 = 3.65, 3 – 6.35 = -3.35
A measure representative of the variability of all the scores would be the mean of the deviation scores
(X - ) Add all the deviations and divide by n N However the deviation scores add up to zero (as mean
serves as balance point for scores)
Describing Variability: Variance in a Population
To remove the +/- signs we simply square each deviation before finding the average. This is called the Variance:
(X - )² = 106.55 = 5.33 N 20
The numerator is referred to as the Sum of Squares (SS): as it refers to the sum of the squared deviations around the mean value
SS is a basic component of variability – the sum of squared deviation scores
Variability: Variance in a Population
let X = [3, 4, 5 ,6, 7] Mean = 5 (X - Mean ) = [-2, -1, 0, 1, 2]
subtract Mean from each number in X (X - Mean )2 = [4, 1, 0, 1, 4]
squared deviations from the mean (X - Mean )2 = 10
sum of squared deviations from the mean (SS) (X - Mean )2 /N = 10/5 = 2
average squared deviation from the mean
NX
r
22 )(
Variability: Variance in a Population let X = [1, 3, 5, 7, 9] Mean = 5 (X - Mean) = [-4, -2, 0, 2, 4 ]
subtract Mean from each number in X (X - Mean)2 = [16, 4, 0, 4, 16]
squared deviations from the mean (X - Mean)2 = 40
sum of squared deviations from the mean (SS) (X - Mean)2 /n = 40/5 = 8
average squared deviation from the mean
NX
r
22 )(
Variance can be calculated with the sum of squares (SS) divided by n
Variability: Variance in a Population
NX
r
22 )(
Variability: Variance in a Sample
Variance in a sample
n is the number of scores -1SS is the Sum of Squared Deviations From the Mean
So, variance (S2) is the average squared deviation from the mean
SS (X X)2
1)( 2
2
n
XXS
Describing Variability: Population and Sample Variance
Population variance is designated by ² ² = (X - )² = SS N N
Sample Variance is designated by s² Samples are less variable than populations: they therefore give
biased estimates of population variability Degrees of Freedom (df): the number of independent (free to
vary) scores. In a sample, the sample mean must be known before the variance can be calculated, therefore the final score is dependent on earlier scores: df = n -1
s² = (x - M)² = SS = 106.55 = 5.61 n - 1 n -1 20 -1
Describing Variability: the Standard Deviation
Variance is a measure based on squared distances In order to get around this, we can take the square
root of the variance, which gives us the standard deviation
Population () and Sample (s) standard deviation
= (X - )² N
s = (X - M)² n - 1
Variability: Standard Deviation of a Sample The square root of Variance is called the
Standard Deviation
1)( 2
2
n
XXS
1)( 2
n
XXS
Variance
Standard Deviation
Variability: Standard Deviation “The Standard Deviation tells us
approximately how far the scores vary from the mean on average”
It is approximately the average deviation of scores from the mean
The Standard Deviation and the Normal Distribution
There are known percentages of scores above or below any given point on a normal curve 34% of scores between the mean
and 1 SD above or below the mean
An additional 14% of scores between 1 and 2 SDs above or below the mean
Thus, about 96% of all scores are within 2 SDs of the mean (34% + 34% + 14% + 14% = 96%)
Note: 34% and 14% figures can be useful to remember
Pro
babi
lity
Den
sity
Describing Variability
The standard deviation is the most common measure of variability, but the others can be used. A good measure of variability must:
Must be stable and reliable: not be greatly affected by little details in the data
Extreme scores Multiple sampling from the same population Open-ended distributions
Both the variance and SD are related to other statistical techniques
SS Computational Formula Note this formula on page 93. In later
chapters, we will be using this alternate SS formula.
Credits http://www.le.ac.uk/pc/sk219/introtostats1.ppt#259,4,Plotting Data:
describing spread of data http://math.usask.ca/~miket/Sullivan_PP/Chapter_3/sec3_4.ppt#24