Week 4 Measures of Spread

8
Week 4 – Measures of Spread Course Learning Outcomes covered today: Calculate descriptive statistics summarizing the centre and spread of a set of data. Class agenda: Recap of Last Week Measures of Spread o Range o Variance & Standard Deviation o Coefficient of Variation Measures of Relative Standing o Percentiles o Boxplots Recap of Last Week On your phone, tablet, or computer, open a web browser and go to menti.com. You will still need to be able to see my shared screen. So, if you’re using your computer, make sure you can see both my screen and the menti.com webpage. When I open the Mentimeter quiz, you will see a code on my shared screen. Enter this code to gain access to the quiz lobby. Like before, you can enter any screen name you’d prefer be displayed in the leaderboard and the only prize for ‘winning’ is pride. Today, we will be continuing with the data/example we were working with last week. As a reminder, we had calculated the various columns summarized in the table below.

Transcript of Week 4 Measures of Spread

Week 4 – Measures of Spread

Course Learning Outcomes covered today: Calculate descriptive statistics summarizing the centre and spread of a set of data.

Class agenda: Recap of Last Week Measures of Spread

o Range o Variance & Standard Deviation o Coefficient of Variation

Measures of Relative Standing o Percentiles o Boxplots

Recap of Last Week On your phone, tablet, or computer, open a web browser and go to menti.com. You will still need to be able to see my shared screen. So, if you’re using your computer, make sure you can see both my screen and the menti.com webpage. When I open the Mentimeter quiz, you will see a code on my shared screen. Enter this code to gain access to the quiz lobby. Like before, you can enter any screen name you’d prefer be displayed in the leaderboard and the only prize for ‘winning’ is pride. Today, we will be continuing with the data/example we were working with last week. As a reminder, we had calculated the various columns summarized in the table below.

Measures of Spread Measures of spread (also know as measures of variation or measures of dispersion) seek to provide a quantitative value that, loosely speaking, measures how spread out the data is. Range The most basic measure of variation is the range. The range is calculated by subtracting the minimum value from the maximum value:

range = maximum −minimum

Variance and Standard Deviation The standard deviation is comparable to the mean and is the most commonly used measure of spread. The formulas for the standard deviation, which are listed on your formula sheet, are

The variance is the more foundational measure of spread. It is the average squared deviation from the mean. Essentially, it is the squared version of the standard deviation (or the version that is not square rooted). It also appears on your formula sheet, but is simply shown as σ2 or s2 for the population variance and the sample variance respectively. We will focus more on standard deviation since it is more commonly used. The main reason for this is that standard deviation is on the same 'level' as the original data whereas the variance is working with squared values. (The square root in the standard deviation formula counteracts the squaring that is done during the calculation.)

The standard deviation is a measure of how much the data differs from the mean. It can be thought of roughly as the average difference between the data values and the mean. While the formula appears quite complicated, the process can be simplified a bit by using the following table:

𝒙𝒊 �̅� (𝒙𝒊 − 𝒙) (𝒙𝒊 − �̅�)𝟐

Note: There is an alternate version of this formula on the formula sheet. You are welcome to use this formula if you prefer.

Key properties of the standard deviation: It is never negative

It is only zero if all of the data points are exactly the same (which doesn’t really happen) A larger standard deviation indicates more variation (spread) amongst data points

It is influenced by outliers

Coefficient of Variation Standardization is a common idea when there is a desire to compare data sets. We saw a similar idea quite recently with the concept of relative frequency. The coefficient of variation is a standardized version of the standard deviation that is standardized relative to the mean. The formula below is provided in your formula sheet.

The coefficient of variation portrays the standard deviation as a percentage of the mean.

Example: In a fifth grade class, a teacher has noticed that the heights of the girls in her class seems highly variable. To explore this, she took the height measurement of each of the 8 girls in her class. The following data is the height in centimetres for a sample of n = 8 fifth grade girls:

135 147 139 126 141 153 142 149

Calculate the range, variance, standard deviation, and coefficient of variation.

𝒙𝒊 𝒙 (𝒙𝒊 − �̅�) (𝒙𝒊 − �̅�)𝟐

135

147

139

126

141

153

142

149

Measures of Relative Standing Measures of relative standing depict a data point's position amongst other data points within the same data set.

Percentiles A percentile portrays a data point based on the percentage of values that it is greater or equal to. For example, the 80th percentile is the data value that is greater or equal to 80% of other data values. Consequently, it is less than 20% of data values as well. To find the pth percentile value Order the data set in ascending order

Calculate the locator variable

Use the locator variable to determine the percentile value

o The whole part determines the location of the starting point o The decimal part determines the percentage of the distance we move towards the

next point

The process is somewhat similar to that of the median. In fact, the median is just the 50th percentile. Example: Use the IQ data below to find the following:

1. The 30th percentile 2. The value that has 10% of the data above it

(data is sorted for convenience)

50 56 70 72 73 74 75 76 76 76 76 76 77 77 78

80 80 80 84 85 85 85 85 86 86 86 86 87 87 88

88 88 89 89 89 91 92 93 94 94 94 95 96 96 96

96 96 96 96 97 97 98 99 99 99 99 100 101 101 102

104 104 105 105 106 107 107 107 107 108 111 115 115 118 120

125 128 141

Boxplots

The boxplot is a graphical representation meant to display the spread of the data. A boxplot uses quartiles: The first quartile ,Q1 = P25

The second quartile, Q2 = P50 = median

The third quartile, Q3 = P75

There are two lesser know quartiles: Q0 = P0 = minimum value

Q4 = P100 = maximum value

A boxplot, sometimes called a box and whisker plot, can be draw horizontally or vertically and is comprised of a box a line extending in each direction beyond the box called 'whiskers'. To construct a boxplot Draw a box that extends from Q1 to Q3

Bisect the box with a line drawn at Q2 (the median) Draw the whiskers

o Maximum whisker length is determined as 1.5 × IQR = 1.5 × (Q3 − Q1) o Extend the lower whisker from Q1 until it extends the maximum whisker length or

reaches the minimum data value, whichever is shorter o Extend the upper whisker from Q3 until it extends the maximum whisker length or

reaches the maximum data value, whichever is shorter o If any data points exist past the end of the whisker, mark these as outlier points

Example: Create a boxplot using the IQ data above.

50 56 70 72 73 74 75 76 76 76 76 76 77 77 78

80 80 80 84 85 85 85 85 86 86 86 86 87 87 88

88 88 89 89 89 91 92 93 94 94 94 95 96 96 96

96 96 96 96 97 97 98 99 99 99 99 100 101 101 102

104 104 105 105 106 107 107 107 107 108 111 115 115 118 120

125 128 141