SOCS ShapeOutliersCenterSpread. We can use a graphs to look at the shape of the quantitative...

download SOCS ShapeOutliersCenterSpread.  We can use a graphs to look at the shape of the quantitative variable distribution.  An example of a bell-shaped.

If you can't read please download the document

Transcript of SOCS ShapeOutliersCenterSpread. We can use a graphs to look at the shape of the quantitative...

  • Slide 1
  • SOCS ShapeOutliersCenterSpread
  • Slide 2
  • Slide 3
  • We can use a graphs to look at the shape of the quantitative variable distribution. An example of a bell-shaped or normal distribution which appear often in nature: Symmetric Mean, median, mode roughly equal
  • Slide 4
  • Scores from an easy exam, skewed left. Scores from a hard exam, skewed right. Non-symmetric Mean < Median < Mode Non-symmetric Mean > Median > Mode
  • Slide 5
  • Shape described by number of peaks (mode)
  • Slide 6
  • Slide 7
  • An outlier is an extreme value of the data (extremely high or extremely low). It is an observation value that is significantly different from the rest of the data. There may be more than one outlier in a set of data.
  • Slide 8
  • Possible Reasons for Outliers: 1. An error was made while taking the measurement or entering it into the computer. 2. The individual belongs to a different group than the bulk of individuals measured. 3. The outlier is a legitimate, though extreme data value.
  • Slide 9
  • We can identify an outlier if it is Less Q 1 1.5IQR or Greater than Q 3 + 1.5IQR
  • Slide 10
  • Make a box and whisker plot of the data and identify any outliers. 10, 12, 11, 15, 11, 14, 13, 17, 12, 22, 14, 11
  • Slide 11
  • Australia: $2.20 Canada: $2.02 Germany: $4.58 Mexico: $2.09 United States: $1.59 Japan: $3.47 Taiwan: $2.16 a.Make a box and whisker plot for the gasoline prices. a.Which countries, if any, had gasoline prices that can be considered outliers?
  • Slide 12
  • Measures of center: mean, median, mode MeanMedianMode Average of the data set The middle value of a data set arranged from smallest to largest The data value that occurs the most often, is a common measure of center for categorical data
  • Slide 13
  • Slide 14
  • When describing data, you must decide which number is the most appropriate description of the center. Mean Median applet: http://bcs.whfreeman.com/tps3e/content/cat_020/applets/mea nmedian.html http://bcs.whfreeman.com/tps3e/content/cat_020/applets/mea nmedian.html Use the mean on symmetric data and the median on skewed data or data with outliers
  • Slide 15
  • Slide 16
  • RangeInterquartile Range Mean Absolute Deviation Variance and Standard Deviation Max value subtract minimum value (spread of all data) Interquartile range (IQR) : shows middle 50% of data IQR = Q 3 Q 1 Not affected as much by outliers Use when measure of center is median Average distance between each data value and the mean Use when measure of center is mean a measure of the average deviation of all observations from the mean.
  • Slide 17
  • Complete a 5 number summary and box and whisker plot for the following data. Number of hours spent on internet per week : 12, 4, 16, 18, 1, 6, 10, 8
  • Slide 18
  • To calculate Mean Absolute Value Deviation: Calculate the mean for the data set. Find the distance between each data value and the mean. That is, find the absolute value of the difference between each data value and the mean. Find the average of those distances.
  • Slide 19
  • Find the mean absolute value of the following data set: 52, 48, 60, 55, 59, 54, 58, 62
  • Slide 20
  • A measure of spread is the Standard Deviation: a measure of the average deviation of all observations from the mean. The symbol for Standard Deviation is (the Greek letter sigma).
  • Slide 21
  • Slide 22
  • Calculate the standard deviation of the following test scores: 15, 20, 21, 20, 36, 15, 25, 15
  • Slide 23
  • The shape of the datas distribution! If data are symmetric, with no serious outliers, use range and standard deviation. If data are skewed, and/or have serious outliers, use IQR.
  • Slide 24
  • Quantitative Data: through graphs Categorical Data: through two way frequency tables
  • Slide 25
  • Multiple bar graphs Multiple box and whisker plots
  • Slide 26
  • These tables examine the relationships between the two categorical variables. A two-way frequency table will deal with two variables
  • Slide 27
  • Slide 28
  • Relative frequency is the ratio of the value of a subtotal to the value of the total.
  • Slide 29
  • Create a two-way frequency table for the following problem.
  • Slide 30
  • Slide 31
  • Slide 32