E370 Statistical Analysis for Bus & Econ Chapter 3: Summary Statistics.

E370 Statistical Analysis for Bus & Econ

Chapter 3: Summary Statistics

Objectives:

Be able to import data into Excel. Be able to generate useful summary statistics

from the data using Excel. (Data Crunch) Be able to interpret summary statistics and use

them to describe a dataset. (Value Added)

Why do we care?

These days the problem is not lack of data, but an overwhelming amount of data. We need to extract information out of the data sets we have. To do that we condense data into tables and graphs summarize data into descriptive statistics

Overview:

Summary of a dataset (3 dimensions)Center Where are the data values concentrated?

What seem to be typical or middle data values?

Dispersion The scattering or spread of data around its center. How much variation is there in the data?

Shape Are the data values distributed symmetrically? Skewed?

Measures of Center:Statistic Formula Excel Command Pros and Cons

Mean =AVERAGE(Array) Pros: use all the informationCons: sensitive to extreme value

Median Middle value in sorted array

=MEDIAN(Array) Pros: robust to extreme valueCons: not use all the information

Mode Most frequently occurring data value

=MODE.MULT(Array)

Pros: the only measure of center for nominal dataCons: unreliable

Measures of Dispersion:Statistic Formula Excel CommandRange =MAX(Array)-

MIN(Array)

Variance Population:

Sample:

=VAR.P(Array)

VAR.S(Array)

Standard Deviation

Population:

Sample:

=STDEV.P(Array)

=STDEV.S(Array)

Coefficient of Variance(CV)

Population:

Sample:

=STDEV.P(Array)/AVERAGE(Array)*100=STDEV.S(Array)/AVERAGE(Array)*100

minmax XX

Measures of Dispersion:

1. Range: the distance between the largest value and the smallest value in the dataset.

2. Variance: the average squared distances of observations from their mean. “Squared” units difficult to interpret.

3. Standard Deviation: a type of average distance of observations from their mean. (Calculated by taking square root of the variance.)

4. Coefficient of Variance(CV): a measure of “relative” dispersion (unit-free). It is useful for comparing dispersion of variables measured in different units or with different means.

Shape of Distribution:

By comparing the three measures of center: a. Mean > Median (>Mode): positively or right-skewedb. Mean = Median (=Mode): symmetricc. Mean < Median (<Mode): negatively or left-skewedThe tail points to the direction of skewness.

Shape of Distribution(cont’d):

By using Pearson’s Second Skewness Coefficient:a. Pearson’s Second Skewness Coefficient > 0: positively

or right-skewedb. Pearson’s Second Skewness Coefficient = 0: symmetricc. Pearson’s Second Skewness Coefficient < 0: negatively

or left-skewed

Pearson’s Second Skewness Coefficient=3*(mean-median)/standard deviation

Summary Statistics

• The Standard Deviation and Sample Variance are sample statistics• Skewness is NOT the Pearson’s skewness coefficient

Calories

Mean 146.1111Standard Error 4.012704Median 145Mode 140Standard Deviation 29.48723Sample Variance 869.4969Kurtosis -0.68913Skewness -0.1484Range 110Minimum 90Maximum 200Sum 7890Count 54

Excel Commands:Excel Command OutputAVERAGE(Array) Mean of the dataMEDIAN(Array) Approximate median of the dataMODE.MULT(Array) Mode(s) of the dataVAR.P(Array) Population varianceSTDEV.P(Array) Population standard deviationVAR.S(Array) Sample varianceSTDEV.S(Array) Sample standard deviationMAX(Array) Largest number in the dataMIN(Array) Smallest number in the dataMAX(Array)-MIN(Array) Range of the dataData/Data Analysis/Descriptive Statistics

Table of selected descriptive statistics

E370 Statistical Analysis for Bus & Econ Chapter 3: Summary Statistics.

Documents

Transcript of E370 Statistical Analysis for Bus & Econ Chapter 3: Summary Statistics.