GTECH 201 Lecture 12

27
GTECH 201 Lecture 12 Intro to Descriptive Statistics

description

GTECH 201 Lecture 12. Intro to Descriptive Statistics. Topics for Today. Measures of Central Tendency Mean, Median, Mode Sample and Population Mean Weighted Means Selecting Appropriate Measures of Central Tendency Measures of Dispersion Variance Standard Deviation. - PowerPoint PPT Presentation

Transcript of GTECH 201 Lecture 12

Page 1: GTECH 201 Lecture 12

GTECH 201Lecture 12

Intro to Descriptive Statistics

Page 2: GTECH 201 Lecture 12

Topics for Today

Measures of Central Tendency Mean, Median, Mode Sample and Population Mean Weighted Means Selecting Appropriate Measures of

Central Tendency Measures of Dispersion

Variance Standard Deviation

Page 3: GTECH 201 Lecture 12

Descriptive vs. Inferential

Descriptive Statistics Methods for organizing and

summarizing information

Inferential Statistics Methods for drawing and measuring

the reliability of conclusions about a population based on information obtained from a sample of the population

Page 4: GTECH 201 Lecture 12

Looking at This Data Set…

Student Performance in Class Tests

ID Test 1 Test 2 Test 3 Test 4

1 2463 B+ A 95 102 4140 A- A 90 9.53 1210 D F 0 04 O649 D+ B+ 80.5 95 2925 B ? 86 8.56 4194 A- A 86.5 97 4266 B+ F 90.5 8.58 2517 A- A 83.5 10

Page 5: GTECH 201 Lecture 12

Overview Mean Median Mode Sample and Population Mean Weighted Means Selecting Appropriate Measures of

Central Tendency Applying these measures

Page 6: GTECH 201 Lecture 12

Mean

The mean of a set of n observations is the arithmetic average

Mean of n observations x1, x2,x3,….xn is

In Excel, =AVERAGE(insert range)

ixxn

1

i n

ix

Page 7: GTECH 201 Lecture 12

Median

The data value that is exactly in the middle of an ordered list if the number of pieces of data is odd

The mean of the two middle pieces of data in an ordered list if the number of pieces of data is even

The median is a typical value; it is the midpoint of observations when they are arranged in an ascending or descending order

Page 8: GTECH 201 Lecture 12

Mode The most frequent data value; i.e., any

value having the highest frequency among the observations

In Excel,you use the functions =MEDIAN (insert range)

=MODE (insert range) Unimodal, Bimodal, Multimodal data

sets Outliers

Page 9: GTECH 201 Lecture 12

Sample and Population Means

Mean of a data set Population mean if data set includes

entire population

Sample mean if data set is only a sample of the population

iX

N

ixxn

Page 10: GTECH 201 Lecture 12

Weighted Means

To calculate the mean when your information is available only in the form of summary data

C Interval Freq25 – 29.9 430 – 34.9 535 – 39.9 12

j jx fx

n

Page 11: GTECH 201 Lecture 12

Skewed Distributions

Page 12: GTECH 201 Lecture 12

Skewed Distributions When there is one mode and the distribution

is symmetric mean, median, mode are the same

Positive skew mean moves towards the positive tail median also pulls towards the positive tail

Negative skew mean moves towards the negative tail median also moves towards the negative tail

Page 13: GTECH 201 Lecture 12

Selecting Appropriate Measures

Mean affected by extreme values includes all observations, therefore

comprehensive (useful for interval/ratio data) Median

not affected by the number of observations reveals typical situations (used for ordinal data)

Mode useful for nominal variables

Page 14: GTECH 201 Lecture 12

Other Useful Calculations

In addition to the sum of data, xwe need to be able to calculate:

2 2; ;x x x x x

2 2x x

xy x y

Page 15: GTECH 201 Lecture 12

Variability or Spread Mean and the median - limits Range – coarse measure of variability Percentiles

kth percentile is the point at which k percent of the numbers fall below it and the rest are fall above it

25th percentile (lower quartile) 50th percentile (median) 75th percentile (upper quartile) Interquartile range (difference between the 25th

percentile value and the 75th percentile value)

Page 16: GTECH 201 Lecture 12

Describing the Spread

A five number summary Median Quartiles Extremes

Variance and Standard Deviation Measures spread about the mean Standard deviation cannot be discussed

without the mean

Page 17: GTECH 201 Lecture 12

Calculating PercentilesIn the list of twelve observations2 4 7 11 11 11 11 14 16 16 24 29Compute median, 25th and 75th percentiles

11 11

2

Median

The lower quartile is the median of the 6 observations that fall below the medianThe upper quartile is the median of the 6 observations that fall above the median

7 112

16 162

Page 18: GTECH 201 Lecture 12

Five Number Summary

Median = 11 Lower Quartile = 9 Upper Quartile = 16 Extremes are 2 and 29 Can compute the range = 27 In a symmetric distribution, the lower

and upper quartiles are equally distant from the median

Page 19: GTECH 201 Lecture 12

Variance Is the mean of the squares of the

deviations of the observations from their mean

Population variance

Sample variance

2

2

iX

N

2

2

1

ix x

sn

Page 20: GTECH 201 Lecture 12

ExampleThe heights, in inches for five starting players in a

men’s college basket ball team are:

67 72 76 76 84

Compute the mean and standard deviation.

x67 -8 6472 -3 976 1 176 1 184 9 81

375 0 156

2x xx x

xx

n= 75

Page 21: GTECH 201 Lecture 12

Standard Deviation Standard deviation is positive

square root of the variance

Variance in our basketball example:

2

2

1

ix x

sn

2 156

4s = 39

Page 22: GTECH 201 Lecture 12

Formulas – Standard Deviation

2

1

ix x

sn

Standard deviation of a sample

Standard deviation of a population

2

iX

N

Page 23: GTECH 201 Lecture 12

Example (Continued)

2

1

ix x

sn

39

6.24

s

s

Page 24: GTECH 201 Lecture 12

Short Cut – Simpler Formula

22

1

n x x

sn n

Standard Deviationof a sample

Sum of the squares of data values, i.e., you square each data value and then sum those squared valuesSquare of the sum of data values, i.e., you sum all the data values and then square that sum

2 x

2

x

Page 25: GTECH 201 Lecture 12

Example (using the short cut)

x67 448972 518476 577676 577684 7056

375 28281

2x

2 2

375

140625

x

25 28281 375

5 4

780

20

s

s

39

6.24

s

s

Page 26: GTECH 201 Lecture 12

Interpreting Std. Deviation

s and s 2 will be small when all the data are close together

The deviations from the mean Will be both positive and negative Sum will always be 0

s is always 0 or a positive number s = 0 means no spread; as s value

increases, the spread of the data increases The units of s are the same as the original

observations s is heavily influenced by outliers

Page 27: GTECH 201 Lecture 12

Coefficient of Variation

CV is the standard deviation described as a percent of the mean

CV =

100 s

x

CV is useful when comparing different sets of data where sample size and standard deviation are different