E370 Statistical Analysis for Bus & Econ Chapter 3: Summary Statistics.
-
Upload
mae-harrell -
Category
Documents
-
view
221 -
download
0
description
Transcript of E370 Statistical Analysis for Bus & Econ Chapter 3: Summary Statistics.
E370 Statistical Analysis for Bus & Econ
Chapter 3: Summary Statistics
Objectives:
Be able to import data into Excel. Be able to generate useful summary statistics
from the data using Excel. (Data Crunch) Be able to interpret summary statistics and use
them to describe a dataset. (Value Added)
Why do we care?
These days the problem is not lack of data, but an overwhelming amount of data. We need to extract information out of the data sets we have. To do that we condense data into tables and graphs summarize data into descriptive statistics
Overview:
Summary of a dataset (3 dimensions)Center Where are the data values concentrated?
What seem to be typical or middle data values?
Dispersion The scattering or spread of data around its center. How much variation is there in the data?
Shape Are the data values distributed symmetrically? Skewed?
Measures of Center:Statistic Formula Excel Command Pros and Cons
Mean =AVERAGE(Array) Pros: use all the informationCons: sensitive to extreme value
Median Middle value in sorted array
=MEDIAN(Array) Pros: robust to extreme valueCons: not use all the information
Mode Most frequently occurring data value
=MODE.MULT(Array)
Pros: the only measure of center for nominal dataCons: unreliable
Measures of Dispersion:Statistic Formula Excel CommandRange =MAX(Array)-
MIN(Array)
Variance Population:
Sample:
=VAR.P(Array)
VAR.S(Array)
Standard Deviation
Population:
Sample:
=STDEV.P(Array)
=STDEV.S(Array)
Coefficient of Variance(CV)
Population:
Sample:
=STDEV.P(Array)/AVERAGE(Array)*100=STDEV.S(Array)/AVERAGE(Array)*100
minmax XX
Measures of Dispersion:
1. Range: the distance between the largest value and the smallest value in the dataset.
2. Variance: the average squared distances of observations from their mean. “Squared” units difficult to interpret.
3. Standard Deviation: a type of average distance of observations from their mean. (Calculated by taking square root of the variance.)
4. Coefficient of Variance(CV): a measure of “relative” dispersion (unit-free). It is useful for comparing dispersion of variables measured in different units or with different means.
Shape of Distribution:
By comparing the three measures of center: a. Mean > Median (>Mode): positively or right-skewedb. Mean = Median (=Mode): symmetricc. Mean < Median (<Mode): negatively or left-skewedThe tail points to the direction of skewness.
Shape of Distribution(cont’d):
By using Pearson’s Second Skewness Coefficient:a. Pearson’s Second Skewness Coefficient > 0: positively
or right-skewedb. Pearson’s Second Skewness Coefficient = 0: symmetricc. Pearson’s Second Skewness Coefficient < 0: negatively
or left-skewed
Pearson’s Second Skewness Coefficient=3*(mean-median)/standard deviation
Summary Statistics
• The Standard Deviation and Sample Variance are sample statistics• Skewness is NOT the Pearson’s skewness coefficient
Calories
Mean 146.1111Standard Error 4.012704Median 145Mode 140Standard Deviation 29.48723Sample Variance 869.4969Kurtosis -0.68913Skewness -0.1484Range 110Minimum 90Maximum 200Sum 7890Count 54
Excel Commands:Excel Command OutputAVERAGE(Array) Mean of the dataMEDIAN(Array) Approximate median of the dataMODE.MULT(Array) Mode(s) of the dataVAR.P(Array) Population varianceSTDEV.P(Array) Population standard deviationVAR.S(Array) Sample varianceSTDEV.S(Array) Sample standard deviationMAX(Array) Largest number in the dataMIN(Array) Smallest number in the dataMAX(Array)-MIN(Array) Range of the dataData/Data Analysis/Descriptive Statistics
Table of selected descriptive statistics