1 Midterm Review. 2 Econ 240A Descriptive Statistics Probability Inference Differences between...

61
1 Midterm Review

Transcript of 1 Midterm Review. 2 Econ 240A Descriptive Statistics Probability Inference Differences between...

Page 1: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

1

Midterm Review

Page 2: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

2

Econ 240ADescriptive StatisticsProbability InferenceDifferences between populationsRegression

Page 3: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

3

I. Descriptive StatisticsTelling stories with Tables and Graphs

That are self-explanatory and esthetically appealing

Exploratory Data Analysis for random variables that are not normally distributed Stem and Leaf diagrams Box and Whisker Plots

Page 4: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

4

Stem and Leaf DiagtamExample: Problem 2.24Prices in thousands of $ of houses sold

in a Los Angeles suburb in a given year

Page 5: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

5

Subsample

Prices289208255215270222206221210224209250222213220250209

Problem 2.24Prices in thousands $Houses sold in a Los Angeles suburb

Page 6: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

6

Sorted Data

Prices192195198200202205206206208208209209209209209210211

Problem 2.24Prices in thousands $Houses sold in a Los Angeles suburb

Page 7: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

7

Prices

Mean 237.9882Standard Error 3.314365Median 230Mode 222Standard Deviation 30.55693Sample Variance 933.7261Kurtosis 1.620493Skewness 1.164885Range 149Minimum 192Maximum 341Sum 20229Count 85

Summary StatisticsProblem 2.24Prices in thousands $Houses sold in a Los Angeles suburb

Page 8: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

8

Stem & Leaf Display

Stems Leaves19 ->25820 ->02566889999921 ->0123345778922 ->001222222334669923 ->0033624 ->01224446778825 ->0000225526 ->056927 ->0023568928 ->692930 ->6831323334 ->01

Problem 2.24Prices in thousands $Houses sold in a Los Angeles suburb

Page 9: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

9

Box and Whiskers PlotsExample: Problem 4.30Starting salaries by degree

Page 10: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

10

SubsampleBA BSc BBA Other

26819 28930 38968 3455025797 36602 35187 3024529115 35098 29452 3152032877 36793 30943 2668030015 36171 31610 2904725090 28396 39738 3503723163 26204 37444 2655028225 37280 38403 3570425103 37660 36459 3226229742 24539 37963 3420624587 27222 34138 2691720780 39536 42062 2672330353 32653 32700 36297

Problem 4.50Starting salariesBy degree

Page 11: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

11

BASmallest = 18719Q1 = 25730Median = 27765Q3 = 29835.5Largest = 37025IQR = 4105.5Outliers: 37025, 36345, 18719,

BScSmallest = 23451Q1 = 29927Median = 33396.5Q3 = 36745.25Largest = 40105IQR = 6818.25Outliers:

BBASmallest = 23401Q1 = 31316Median = 34284Q3 = 39551Largest = 47639IQR = 8235Outliers:

OtherSmallest = 21994Q1 = 28253.5Median = 29950.5Q3 = 32905.25Largest = 38812IQR = 4651.75Outliers:

BoxPlot

0 10000 20000 30000 40000 50000

BoxPlot

0 10000 20000 30000 40000 50000

BoxPlot

0 10000 20000 30000 40000 50000

BoxPlot

0 10000 20000 30000 40000 50000

Page 12: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

12

BASmallest = 18719Q1 = 25730Median = 27765Q3 = 29835.5Largest = 37025IQR = 4105.5Outliers: 37025, 36345, 18719,

BoxPlot

0 10000 20000 30000 40000 50000

BoxPlot

0 10000 20000 30000 40000 50000

BScSmallest = 23451Q1 = 29927Median = 33396.5Q3 = 36745.25Largest = 40105IQR = 6818.25Outliers:

BoxPlot

0 10000 20000 30000 40000 50000

BoxPlot

0 10000 20000 30000 40000 50000

Page 13: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

13

BBASmallest = 23401Q1 = 31316Median = 34284Q3 = 39551Largest = 47639IQR = 8235Outliers:

BoxPlot

0 10000 20000 30000 40000 50000

BoxPlot

0 10000 20000 30000 40000 50000

OtherSmallest = 21994Q1 = 28253.5Median = 29950.5Q3 = 32905.25Largest = 38812IQR = 4651.75Outliers:

BoxPlot

0 10000 20000 30000 40000 50000

Page 14: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

14

II. ProbabilityConcepts

Elementary outcomes Bernoulli trials Random experiments events

Page 15: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

15

Probability (Cont.)Rules or axioms:Addition rule

P(AUB) = P(A) + P(B) – P(A^B)Conditional probability

P(A/B) = P(A^B)/P(B) Independence

Page 16: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

16

Probability ( Cont.)Conditional probability

P(A/B) = P(A^B)/P(B) Independence

P(A)*P(B) = P(A^B) So P(A/B) = P(A)

Page 17: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

17

Probability (Cont.)Discrete Binomial Distribution

P(k) = Cn(k) pk (1-p)n-k

n repeated independent Bernoulli trials k successes and n-k failures

Page 18: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

18

Binomial Random Number Generator

Take 50 statesSuppose each state was a battleground

state, with probability 0.5 of winning that state

What would the distribution of states look like? How few could you win? How many could you win?

Page 19: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

19

24

24

28

25

18

29

25

24

24

23

25

24

29

32

28

30

23

27

21

Subsample

Page 20: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

20

Histogram of States Won

0

2

4

6

8

10

Bin

Fre

qu

ency

Page 21: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

21

Discrete Probability Density, p=0.5

0

0.02

0.04

0.06

0.08

0.1

0.12

15 20 25 30 35 40

States Won

Pro

ba

bil

ity

Page 22: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

22

Discrete Cumulative Distribution, p=0.5

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25 30 35 40

States Won

Pro

ba

bil

ty

Page 23: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

23

Discrete Cumulative Distribution

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25 30 35 40

States Won

Pro

ba

bil

ity

p=0.5

p=0.48

Page 24: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

24

Probability (Cont.)Continuous normal distribution as an

approximation to the binomial n*p>5, n(1-p)>5 f(z) = (1/2½ exp[-½*z2] z=(x- f(x) = (1/ (1/2½ exp[-½*{(x-

Page 25: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

25

III. InferenceRates and ProportionsPopulation Means and Sample MeansPopulation Variances and Sample

VariancesDecision Theory

Page 26: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

26

Decision Theory In inference, I.e. hypothesis testing, and

confidence interval estimation, we can make mistakes because we are making guesses about unknown parameters

The objective is to minimize the expected cost of making errors

E(C) = C(I) + C(II)

Page 27: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

27

Sample Proportions from Polls

Where n is sample size and k is number of successes

nkp /ˆ

)1(,(~ pnpnpBk

Page 28: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

28

Sample Proportions

)1()/1(,(~ˆ

/)1()1()/1()/1(ˆ

)/1()/1(ˆ22

ppnpNp

npppnpnVarknpVAR

pnpnEknpE

So estimated p-hat is approximately normal for large sample sizes

Page 29: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

29

Sample ProportionsWhere the sample size is large

Page 30: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

30

Problem 9.38A commercial for a household

appliances manufacturer claims that less than 5% of all of its products require a service call in the first year. A consumer protection association wants to check the claim by surveying 400 households that recently purchased one of the company’s appliances

Page 31: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

31

Problem 9.38 (Cont.)What is the probability that more than

10% require a service call in the first year?

What would you say about the commercial’s honesty if in a random sample of 400 households, 10% report at least one service call?

Page 32: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

32

Problem 9.38 Answer Null Hypothesis: H0: p=0.05

Alternative Hypothesis: p>0.05 Statistic:

59.4

95.0)05.0)(400/1(/)05.010.0(

)ˆ1(ˆ)/1(/)ˆ(/)ˆˆ( ˆ

z

z

ppnpppEpz p

Page 33: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

33

0.0

0.1

0.2

0.3

0.4

0.5

-4 -2 0 2 4

Z

NO

RM

DE

NS

Continuous Density of the Standardized Normal Variate, Z

4.59

Z .

Z critical

1.645

5%

Page 34: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

34

Sample means and population means where the population variance is known

Page 35: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

35

Problem 9.26, Sample MeansThe dean of a business school claims

that the average MBA graduate is offered a starting salary of $55,000. The standard deviation of the offers is $4600. What is the probability that in a sample of 38 MBA graduates , the mean starting salary is less than $53,000?

Page 36: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

36

Problem 9.26 (Cont.) Null Hypothesis: H0: Alternative Hypothesis: HA: Statistic:

68.23.746/2000

38/4600/()5300055000(

)//()(/)(

z

z

nxxExz x

Page 37: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

37

0.0

0.1

0.2

0.3

0.4

0.5

-4 -2 0 2 4

Z

NO

RM

DE

NS

Continuous Density of the Standardized Normal Variate, Z

Zcrit(1%)= -2.33

Page 38: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

38

Sample means and population means when the population variance is unknown

Page 39: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

39

Problems 12.33A federal agency responsible for

enforcing laws governing weights and measures routinely inspects packages to determine whether the weight of the contents is at least as great as that advertised on the package. A random sample of 18 containers whose packaging states that the contents weighs 8 ounces was drawn.

Page 40: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

40

Problems 12.33 (Cont.) Can we conclude that on average the

containers are mislabeled? Use

)//()(/)( nsxxExt x

Page 41: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

41

0.0

0.1

0.2

0.3

0.4

-2 -1 0 1 2

RANDT

TD

EN

S

Density Function for Student's t-distribution, 17 Degrees of Freedom

t crit 5%

Page 42: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

42

Problems 12.33 (Cont.)

7.8 7.97 7.92

7.91 7.95 7.87

7.93 7.79 7.92

7.99 8.06 7.98

7.94 7.82 8.05

7.75 7.89 7.91

Page 43: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

43

Mean 7.913888889

Standard Error 0.019969567

Median 7.92

Mode 7.91

Standard Deviation 0.084723695

Sample Variance 0.007178105

Kurtosis -0.24366084

Skewness -0.22739254

Range 0.31

Minimum 7.75

Maximum 8.06

Sum 142.45

Count 18

Page 44: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

44

Problems 12.33 (Cont.) Can we conclude that on average the

containers are mislabeled? Use

3.4

020.0/086.0)18/0847.0/()8914.7(

)//()(/)(

t

t

nsxxExt x

Page 45: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

45

Confidence Intervals for Variances

Page 46: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

46

Problems 12.33 &12.55A federal agency responsible for

enforcing laws governing weights and measures routinely inspects packages to determine whether the weight of the contents is at least as great as that advertised on the package. A random sample of 18 containers whose packaging states that the contents weighs 8 ounces was drawn.

Page 47: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

47

Problems 12.33 &12.55 (Cont.)

Estimate with 95% confidence the variance in contents’ weight.

variable with n-1 degrees of freedom is (n-1)s2 /

Page 48: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

48

0.00

0.02

0.04

0.06

0.08

5 10 15 20 25

RANDCHI

CH

IDE

NS

Chi Square Density for 17 Degrees of Freedom

Page 49: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

49

Problems 12.33 &12.55(Cont.)

7.8 7.97 7.92

7.91 7.95 7.87

7.93 7.79 7.92

7.99 8.06 7.98

7.94 7.82 8.05

7.75 7.89 7.91

Page 50: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

50

Mean 7.913888889

Standard Error 0.019969567

Median 7.92

Mode 7.91

Standard Deviation 0.084723695

Sample Variance 0.007178105

Kurtosis -0.24366084

Skewness -0.22739254

Range 0.31

Minimum 7.75

Maximum 8.06

Sum 142.45

Count 18

Page 51: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

51

Problems 12.33 &12.55(Cont.)7.564<(n-1)s2 /<30.1917.564<17*0.00718/<30.191 (1/7.564)*17*0.00718>>(1/30.191)*17*0

.007180.0161>>0.0040

Page 52: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

52

IV. Differences in Populations Null Hypothesis: H0: or =0

Alternative Hypothesis: HA: ≠ 0

212121

22112

222

1121

2221121

2212121

2121

2][

)])((2)()[(][

)]()[(][

)]()[(][

/)]()[(21

xxCovxVarxVarxxVar

xxxxExxVar

xxExxVar

xxExxVar

xxt xx

Page 53: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

53

IV. Differences in Populations

)]/()/[(/)]()[(

][/)]()[(

2][

/)]()[(

2221

212121

212121

212121

2121 21

nnxxt

xVarxVarxxt

xxCovxVarxVarxxVar

xxt xx

Reference Ch. 9 & Ch. 13

Page 54: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

54

V. Regression Model: yi = a + b*xi + ei

n

i

n

iii

n

ii

n

ii

iii

n

i

n

iiii

ii

exxbTSS

USSSumlainedUnESSSumExplainedTSS

yyTSSSquaresofSumTotalANOVA

esidualsSquaredofSum

yyeerrorestimated

xbyaestimate

xxxxyybestimate

xbayFitted

1 1

222

2

1

1

2

1 1

2

ˆ][ˆ

)(_exp)(_

][)(___:

ˆ:Re___

)ˆ(ˆ:_

*ˆˆ:

][/]][[ˆ:

*ˆˆˆ:

Page 55: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

55

Fortune 500, 1999: Assets Vs. Revenue, In Logs

General Motors

Exxon Mobil

Wal-Mart

Kroger

Ingram Micro

Costco Wholesale

McKesson HBOC

General Electric

CitigroupBank of AmericaFannie May

Chase ManhattenMorgan Stanley

Merrill LynchPrudential

Bank One American InternationalTIAA-CREF

State Farm

Allstate

1000

10000

100000

1000000

10000 100000 1000000

Log Revenue

Lo

g A

ss

ets

Lab Five

Page 56: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

56

rank firm industry revenue M$ profits M$ assets M$5 General Electric Diversified Financials 111630 10717 4052007 Citigroup Diversified Financials 82005 9867 716900

11 Bank of America Corp. Commercial banks 51392 7882 63257426 Fannie Mae Diversified Financials 36968.6 3911.9 575167.431 Chase Manhatten Corp. Commercial Banks 33710 5446 40610548 Prudential Ins.Co. of America Insurance: Life, Health(stock) 26618 813 28509450 Bank One Corp. Commercial Banks 25986 3479 26942530 Morgan Stanley Dean Witter Securities 33928 4791 36696729 Merrill Lynch Securities 34879 2618 32807119 TIAA-CREF Insurance: Life, Health(mutual) 39410.2 1024.07 289247.9917 American International Group Insurance; P&C(stock) 40656.08 5055.44 268238

The Financials

Page 57: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

57

The Financials: Eleven Firms

y = 0.4335x + 8.2535

R2 = 0.3039

12.4

12.6

12.8

13

13.2

13.4

13.6

10 10.2 10.4 10.6 10.8 11 11.2 11.4 11.6 11.8

ln Revenue M$

ln A

ss

ets

M$

Excel Chart

Page 58: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

58

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.5512779R Square 0.3039073Adjusted R Square 0.2265636Standard Error 0.3117374Observations 11

ANOVAdf SS MS F Significance F

Regression 1 0.381851405 0.381851 3.929312 0.078773838Residual 9 0.874622016 0.09718Total 10 1.256473421

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Lower 95.0%Upper 95.0%Intercept 8.2534951 2.33138973 3.540161 0.006313 2.979521108 13.52747 2.979521 13.52747X Variable 1 0.4335105 0.218696259 1.982249 0.078774 -0.061215204 0.928236 -0.06122 0.928236

Excel Regression

Page 59: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

59

12.4

12.6

12.8

13.0

13.2

13.4

13.6

10.0 10.5 11.0 11.5 12.0

LNSALES

LNA

SS

ET

S

Eleven Financial Firms

Eviews Chart

Page 60: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

60

Eviews Regression

Page 61: 1 Midterm Review. 2 Econ 240A  Descriptive Statistics  Probability  Inference  Differences between populations  Regression.

61

-0.4

-0.2

0.0

0.2

0.4

0.6

12.4

12.6

12.8

13.0

13.2

13.4

13.6

1 2 3 4 5 6 7 8 9 10 11

Residual Actual Fitted

Eviews: Actual, Fitted & residual