Statistics for Decision Making STA 253
description
Transcript of Statistics for Decision Making STA 253
![Page 1: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/1.jpg)
1
Statistics forDecision MakingSTA 253
Dr. Ginner W. HudsonCovenant College
![Page 2: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/2.jpg)
2
1.1 Examining Distributions - Intro
A statistical analysis starts with a set of …
Data We construct a set of data by first
deciding what cases or individuals that we want to study.
For each case/individual we record information about characteristics that we call variables.
![Page 3: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/3.jpg)
3
Constructing Our Data Set
Looking at data …Individuals, cases, records
the WHOObservation Takes
PlaceVariable – a characteristic of a
case
the WHAT
![Page 4: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/4.jpg)
4
Important terms
Individuals (cases, records): Objects described by the data. Ex: customers, cities, patients, cars
Variable: A characteristic of a case. Ex: profit, duration of a service call, number of customers, gender Different cases can have different values for the variables. Some variables may be a label to distinguish the different cases.
Distribution of a variable: the values the variable takes and how often it takes them.
![Page 5: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/5.jpg)
5
To better understand a data set, ask:
Who? What cases (individuals) do the data describe? How many cases (individuals)?
Think of an assembly line with the WHO passing by on the conveyor belt and the variables of interest being observed.
![Page 6: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/6.jpg)
6
To better understand a data set, ask:
Who? What cases do the data describe? How many cases?
What? How many variables? What is the exact definition of each variable? What is the unit of measurement for each variable?
Why? What is the purpose of the data? What questions are being asked? Are the variables suitable?
![Page 7: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/7.jpg)
7
Types of variables
Quantitative Variable: Takes numerical values for which we can do
arithmetic Ex: credit card balance, number of employees,
time until customer is served, age
Discrete or continuous? Categorical Variable:
Places a case into one of several groups or categories
Ex: gender, brand of credit card, own a home (yes/no)
![Page 8: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/8.jpg)
8
Example: An iTunes playlist
![Page 9: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/9.jpg)
9
Example: Grade book data for statistics course
![Page 10: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/10.jpg)
10
Example: The FAA
The Federal Aviation Administration (FAA) monitors airlines for safety and customer service. For each flight the carrier must report the type of aircraft, number of passengers, whether or not the flights departed and arrived on schedule, and any mechanical problems.
Identify the WHO.• The FAA• The airline carriers• The passengers• The flights• None of the above
![Page 11: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/11.jpg)
11
![Page 12: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/12.jpg)
12
Example: The common cold
Scientists at a major pharmaceutical firm conducted an experiment to study the effectiveness of an herbal compound to treat the common cold. They exposed volunteers to a cold virus, then gave them either the herbal compound or a useless sugar solution. Several days later they assessed each patient’s condition using a cold severity scale ranging from 0-5.
Identify the WHO.• Scientists• Volunteers• The pharmaceutical firm• The herbal compound• None of the above
![Page 13: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/13.jpg)
13
![Page 14: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/14.jpg)
14
Displaying distributions with graphs
Ways to chart categorical data Bar/column graphs (called Pareto
charts when ordered) Pie charts
Ways to chart quantitative data Histograms Stemplots Time plots
![Page 15: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/15.jpg)
15
Law firm example
A law firm studies the gender of their clients. They find 55% are males and 45% are females.
Cases: Variable: Distribution:
Values: Male, Female How often: 55% and 45%, respectively
Are the data (the variable) categorical or quantitative?
![Page 16: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/16.jpg)
16
![Page 17: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/17.jpg)
17
Credit card example
A credit card company studies the spending behavior of their 21- to 25- year-old customers with a $1000 credit limit. They randomly select 100 of them and record the following variables for each person. For each item identify the type of variable.
Average balance on their card over the last year Whether customer has ever made late payments Which day of the week their card is used the most Customer’s age (in years)
![Page 18: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/18.jpg)
18
Credit card example
For each item, give its possible values.
Average balance on their card over the last year Quantitative: $0.00 through $1000.00
Whether customer has ever made late payments Categorical: Yes, No
Which day of the week their card is used the most Categorical: Sunday, Monday, Tuesday, …, Saturday
Customer’s age (in years) Quantitative: 21, 22, 23, 24, 25 years
![Page 19: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/19.jpg)
19
Displaying categorical data Purpose:
Summarize the data so the reader can grasp the distribution quickly
Process: List the categories Give either the count or the percent of cases that fall into each category
Methods: Tables, pie charts, bar/column graphs, Pareto charts
![Page 20: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/20.jpg)
20
Ways to chart categorical dataBecause the variable is categorical, the data in the graph can be ordered any way we want (alphabetical, by increasing value, by year, by personal preference, etc.).
Bar graphsEach category is represented by
a bar.
Pie chartsThe slices must represent the parts of
one whole.
![Page 21: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/21.jpg)
23
Bar graph sorted by rank (Pareto Chart) Easy to analyze
Automobile Accidents per day of the week
Sorted chronologically Much less useful
![Page 22: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/22.jpg)
24
Ways to chart quantitative data
Histograms and stemplotsThese are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data.
Line graphs: time plotsUse when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over time.
![Page 23: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/23.jpg)
Histograms
The range of values that a variable can take is divided into equal size intervals.
The histogram shows the number of individual data points that fall in each interval. Example: Histogram of the
December 2004 unemployment rates in the 50 states and Puerto Rico.
![Page 24: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/24.jpg)
26
How to create a histogram
It is an iterative process – try and try again.What bin size should you use?
Not too many bins with either 0 or 1 counts Not overly summarized that you loose all the
information Not so detailed that it is no longer summary
rule of thumb: start with 5 to10 bins
Look at the distribution and refine your bins
(There isn’t a unique or “perfect” solution)
![Page 25: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/25.jpg)
Interpreting histograms
When describing the distribution of a quantitative variable, we look for the overall pattern and for striking deviations from that pattern. We can describe the overall pattern of a histogram by its shape, center, and spread.
Histogram with a line connecting each column too
detailed
Histogram with a smoothed curve highlighting the overall
pattern of the distribution
![Page 26: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/26.jpg)
28
Common distribution patterns (shapes)
SymmetricLeft and right sides are mirror images of each other (or
close)
![Page 27: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/27.jpg)
29
Common distribution patterns (shapes)
Skewed leftLeft side extends farther out than the right side
![Page 28: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/28.jpg)
30
Common distribution patterns (shapes)
Skewed rightRight side extends farther out than the left side
![Page 29: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/29.jpg)
31
Common distribution patterns (shapes)
Many shapes are bimodel or complexTwo peaksFirst part symmetric; flat in the middle; increasing at the
end
![Page 30: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/30.jpg)
32
Outliers
An important kind of deviation is an outlier.
Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them.
![Page 31: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/31.jpg)
33
Alaska Florida
Outliers
The overall pattern is fairly symmetrical except for two states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population.
A large gap in the distribution is typically a sign of an outlier.
![Page 32: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/32.jpg)
34
IMPORTANT NOTE:Your data are the way they are. Do not try to force them into a particular shape.
Example: US Female Population 1997
![Page 33: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/33.jpg)
35
It is a common misconception that if you have a large enough data set, the data will eventually turn out nice and symmetrical.
Example: Dry Days per Month 1995
Histogram of dry days in 1995
![Page 34: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/34.jpg)
36
Example: Customer Service Center Call Lengths
![Page 35: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/35.jpg)
37
Example: Customer Service Center Call Lengths
Why were there so many calls lasting 10 seconds or less?
![Page 36: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/36.jpg)
38
Example: Customer Service Center Call Lengths
Example: Customer Service Center Call LengthsThe inappropriate actions by customer service reps were hidden in this histogram where the software chose the classes (bin intervals).
![Page 37: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/37.jpg)
42
Example: Constructing a Histogram
Class Exercise: GDP by Country
2005 Growth Domestic Product (GDP)
Growth Rates for 30 Industrialized Countries
Country Growth Rate %
Turkey 7.4Czech Republic 6.1
Slovakia 6.1
Hungary 4.1
South Korea 4.0
Luxembourg 4.0
Greece 3.7
Poland 3.4
Spain 3.4
Denmark 3.2
United States 3.2
Mexico 3.0
Canada 2.9
Finland 2.9
Sweden 2.7
Japan 2.6
Australia 2.5
New Zealand 2.3
Norway 2.3
Austria 2.0
Switzerland 1.9
United Kingdom 1.9
Belgium 1.5
Netherlands 1.5
France 1.2
Germany 0.9
Portugal 0.4
Italy 0.0
![Page 39: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/39.jpg)
Example: T-bill interest rates
![Page 40: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/40.jpg)
What is this type of plot called?
![Page 41: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/41.jpg)
What is a Time Series?
Time series -- observations collected over time
Time plot -- plot of the data over time
![Page 42: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/42.jpg)
Identifying Trends in the Data
Trend- gradual increases or decreases over time
0
10
20
30
40
50
60
70
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004Year
In m
illio
ns
Annual Sales – XYZ Company
![Page 43: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/43.jpg)
Other Common Components Of Time Series
0
5
10
15
20
25
30
35
1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd0
10
20
30
40
50
60
70
'80 '81 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99
Seasonality Cycles
Quarter Year
![Page 44: Statistics for Decision Making STA 253](https://reader035.fdocuments.in/reader035/viewer/2022062520/56816552550346895dd7ca13/html5/thumbnails/44.jpg)
Line Graphs: Time Plots
Retail Price of Fresh Oranges over Time
This time plot shows a regular pattern of yearly variations. These are seasonal variations in fresh orange pricing most likely due to similar seasonal variations in the production of fresh oranges. There is also an overall upward trend in pricing over time. It could simply be reflecting inflation trends or a more fundamental change in this industry.
Time is on the horizontal, x axis. The variable of interest—here “retail price of fresh oranges”— goes on the vertical, y axis.