STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

40
STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    2

Transcript of STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

Page 1: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

STATISTICS FOR MANAGERS

LECTURE 3:

LOOKING AT DATA AND MAKING INFERENCES

Page 2: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1. LOOKING AT DATA Central part of statistics: describing/

summarizing data Take into account that data come in

different types Sales Security rating Sector

Page 3: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.1 TYPES OF DATA Qualitative/categorical

Attribute (nominal) data Ranked (ordinal) data

Quantitative/numerical Different types of data require different

treatment One can use:

Graphical summaries Numerical summaries

Page 4: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.2 QUALITATIVE DATA Graphical summaries

Pie chart Bar chart Ordered bar chart

Numerical summaries Frequency tables Percentage tables

Page 5: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3 QUANTITATIVE DATA Graphical summaries

Run chart: Example: stock prices Histogram. Example: tick data Box plot

Numerical summaries Arithmetic mean Median Standard deviation Quartiles

Page 6: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3.1 RUN CHART For data collected over time (time series) X-axis: date or number of data point Y-axis: numerical value of data point Things to look for

Trends Seasonality Cycles Outliers

Page 7: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3.1 RUN CHART (cont.)

Figure 2. Ratio of survey income to NAS consumption per capita. Provincial averages over time.

1

1.02

1.04

1.06

1.08

1.1

1.12

1.14

1.16

1.18

1.2

1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001

Weighted by population Unw eighted

Page 8: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3.1 RUN CHART (cont.)

FIGURE 6: DAY 27/02/97 TRANSACTIONS-CLOCK TIME RELATIONSHIP

0

5000

10000

15000

20000

25000

30000

1

21

5

42

9

64

3

85

7

10

71

12

85

14

99

17

13

19

27

21

41

23

55

25

69

27

83

29

97

32

11

34

25

36

39

38

53

40

67

42

81

Transactions

Sec

on

ds

afte

r 9

a.m

.

Page 9: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3.2 HISTOGRAM Determine the range of data Decompose into bins of equal width Count how many data points fall within each bin Construct a bar chart based on these counts Only problem: have to choose the width of the

bin Allows to judge

Center/location Spread/variation Symmetry Outliers

Page 10: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3.2 HISTOGRAM (cont.)FIGURE 12: DAY 22/02/97

FREQUENCY OF PRICE CHANGES (TICKS)

0

500

1000

1500

2000

2500

3000

3500

-4 -3 -2 -1 0 1 2 3 4

Page 11: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3.3 BOX PLOT Pack a lot of information in a single plot

Box that extend from Q1 to Q3 A line inside the box indicates the median Whiskers extend to bottom and top Outliers are denoted by asterisks

Can compare data sets by lining up their box plots.

Page 12: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3.3 BOX PLOT (cont.) .8

11.

21.

41.

61.

8

1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001

Figure 3. Box plot of the ratio over time

Page 13: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3.4 LOCATION Mean: sum up all the data and divide by

the number of points Median

Sort all the data from smallest to largest Take the middle one (for odd number of data) Take the average of the middle two (for even

number of data)

Page 14: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3.4 LOCATION (cont.) Mean versus median

The median is more robust than the mean. This means that it is less affected by extreme observations

As a function of symmetry of the data• Skewed to the left: mean<median• Symmetric: mean approximately equal to median• Skewed to the right: mean>median

For skewed data the median is a more typical observation

Page 15: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3.5 SPREAD Standard deviation

Measures a typical deviation from the mean Do not bother to do it yourself. Let EXCEL or

any other program do it for you. Inter-quartile range

Q1 is median of the bottom half of data Q3 is median of the top half of data IQR=Q3-Q1

Page 16: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

1.3.6 OUTLIER DETECTION Graphically

Use histogram Look for points away from the rest

Numerically Points more than 3 standard deviations away

from the mean Points more than 1.5*IQR away from Q1 and

Q3.

Page 17: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

2. SAMPLING All statistical information is based on

data The process of collecting data is

called sampling It is important to do it right Not everybody seems to understand

this importance

Page 18: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

2. GENERAL SITUATION We study a population Can be a population in the strict sense but

it could also be an experiment We are interested in certain

characteristics of the population (parameter)

Want to learn as much as possible about the parameter

Page 19: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

2. EXAMPLES Population of Beijing

What is the average income? What percentage speak Cantonese? What percentage has Internet? What is the average price of the square

meter? (300.000 euros buy only 174 squares meters).

What is the percentage of people that have a DVD?

Page 20: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

2. BASIC PROBLEM Most populations are very large, or even

infinite Hence it is typically impossible to exactly

determine a parameter (sometime unfeasible from a cost perspective)

But it is possible to learn something about a parameter

By collecting a sample from the population we can obtain information

But the quality of information cna only be as good as the quality of the sample

Page 21: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

2. GOOD SAMPLE, IS THIS HARD?

The sample has to be representative of the population

In collecting data, we must not favor (or disfavor) any particular segment of the population

If we do we get biased samples Biased samples yield biased estimates. Example of biased samples: Internet.

Page 22: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

2. NO VOLUNTEERS PLEASE A sample into which people have entered at

their own choice is called voluntary response sample or self-selected.

This typically happens when polls are posted on the internet, the TV,..

The scheme favors people with strong opinions.

The resulting sample is rarely representative of the population

As so often you get what you pay for! (although something they pay for!)

Page 23: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

2. HOW TO DO IT RIGHT Analogy

Have one ball per member of the population

Put all the balls in a big urn Mix well Take out n balls The result is called simple random

sample

Page 24: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

2. DO IT RIGHT... There are other ways to get

representative samples: Stratified sampling Systematic sampling Cluster sampling (multistage)

Page 25: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

2. ... BUT AN ESTIMATE IS JUST THAT

We can estimate a parameter from the sample (a mean or a proportion)

... but an estimate is not equal to the parameter!

... Because a sample is not equal to a population

We must be aware of sampling error Many people are not! They sell us estimates as if they were

parameters. Shame on them. Will do it right.

Page 26: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3. BASIC ESTIMATION General estimation

We are interested in a population parameter We collect a random sample In a first step we estimate the parameter. This

is usually straighforward. In a second step, we deal with the sampling

error. This requires more work but it is worhwhile.

Page 27: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.1. ESTIMATING A PROPORTION

We are interested in a population proportion p We collect a random sample size n We compute the sample proportion This is a natural estimator for p, But due to the sampling error is not equal to the

true parameter p Goal: quantify the sample uncertainty contained

in the estimator of p Intuition: the larger n the smaller the

uncertainty.

Page 28: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.1. ESTIMATING A PROPORTION

From probability theory we know that the central limit theorem applies, under some assumptions For n large, then with a probability 95% the

population proportion p will be in between

For the interval to be trusted we requirenpp

p)ˆ1(ˆ96.1ˆ

10)ˆ1(10ˆ pnandpn

Page 29: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.1. ESTIMATING A PROPORTION

A confidence interval has the following general form

CI=estimator ±constant x std error (SE) =estimator ± margin of error (ME)For a proportion SE=The SE does not depend on the confidence level

but the ME does because of the constant, which is often abbreviated as z

npp )ˆ1(ˆ

Page 30: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.1. ESTIMATING A PROPORTION

How is the ME affected by its various inputs?

ME=

1. As the confidence level increases the ME goes up.2. As the estimator moves towards 0.5 the ME goes up3. As n increases the ME goes down

We control de confidence level and n, but not the estimator of p

npp

z)ˆ1(ˆ

Page 31: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.1. ESTIMATING A PROPORTION

Want a CI with a specified level and a specif ME? How large a sample size n is needed?

Use ME and solve for n

ME=

Solution:

Catch 22: we have not collected the sample yet, and therefore the estimate for p is not available yet. Solutions:

1. Worst case scenario estimator=0.52. Use a guess based on previous information

npp

z)ˆ1(ˆ

2)ˆ1(ˆ

ME

ppzn

Page 32: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

WHAT CONFIDENCE LEVEL? You may want a confidence level other than

95%. Most common: 90%, 95% and 99%. The formula for the CI is equal You only change the constant 1.96

Higher confidence level give a wider interval

Conf. Level 90% 95% 99%

Constant z 1.64 1.96 2.57

Page 33: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.2. ESTIMATING A MEAN

We are interested in a population mean and use as estimator the sample mean.

CI=estimator ±constant x std error (SE) =estimator ± margin of error (ME)For a mean SE=

CI=

Rule of thumb: need more than 50 obs. To trust this interval

n

s

n

szX

Page 34: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.2. ESTIMATING A MEAN

How is the margin of error (ME) affected by its various inputs?

ME=

1. As confidence level increases, ME goes up2. As s increases the ME goes up3. As n increases the ME goes downWe control n and conf. level but not s.

n

sz

Page 35: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.2. ESTIMATING A MEAN

Want a CI with a specified level and a specif ME? How large a sample size n is needed?

Use ME and solve for n

ME=

Solution:

Catch 22: we have not collected the sample yet, and therefore the estimate for p is not available yet. In this case there is not worst case scenario. Use a guess based on previous information

n

sz

2

MEzs

n

Page 36: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.3. HYPOTHESIS TESTING If you care about wether a

parameter is equal to a certain prespecified value, there is an alternative to hypothesis testing

Just check whether the prespecified value is contained in the confidence interval

Page 37: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.3. HYPOTHESIS TESTING If the prespecified value is contained in

the CI It is one of the (many) plausible values So we can only make a weak positive

statement If the prespecified values is not contained

in the CI It is not one of the plausible values We can make a strong negative statement

Page 38: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.3. CI PERFECT SUBSTITUTE We wonder if a parameter is equal to a

prespecified value? The technique of hypothesis testing give a “yes-

or-no” answer (at a certain level of significance) We can get the same from the level of

confidence ... But in addition we get the range of all

plausible values! This is valuable info. Moral: a confidence interval tends to be safer

and more informative than hypothesis testing

Page 39: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.4. CAVEAT Our confidence intervals are simple, yet

powerful. But you can’t use them blindly! Two conditions to trust them

We need a large sample We need a random sample

Data that is collected over time is usually NOT a random sample: the data point of today is usually related to the data point yesterday

Stock returns are an exception to this rule. Small sample and time series are for the pros!

Page 40: STATISTICS FOR MANAGERS LECTURE 3: LOOKING AT DATA AND MAKING INFERENCES.

3.5. WHAT ABOUT OTHER PARAMETERS

We have covered confidence intervals for Population proportions Population means

Both are based on the CLT There are other interesting parameters

Population median Population standard deviation etc

Unfortunately they cannot be handle by the CLT CI can be constructed but the corresponding techniques

are more difficult.