Section 5.4 The Geometric and Poisson Probability Distributions.
1 Lecture Twelve. 2 Outline Failure Time Analysis Linear Probability Model Poisson Distribution.
-
date post
19-Dec-2015 -
Category
Documents
-
view
223 -
download
1
Transcript of 1 Lecture Twelve. 2 Outline Failure Time Analysis Linear Probability Model Poisson Distribution.
3
Failure Time Analysis
• Example: Duration of Expansions
• Issue: does the probability of an expansion ending depend on how long it has lasted?
• Exponential distribution: assumes the answer since the hazard rate is constant
• Weibull distribution allows a test to be performed
4
Part II: Failure Time Analysis
• Exponential– survival function– hazard rate
• Weibull
• Exploratory Data Analysis, Lab Seven
6
Trough Peak Duration Oct. 1945 Nov. 1948 37 Oct. 1949 July 1953 45 May 1954 August 1957 39 April 1958 April 1960 24 Feb. 1961 Dec. 1969 106 Nov. 1970 Nov. 1973 36 March 1975 January 1980 58 July 1980 July 1981 12 Nov. 1982 July 1990 92 March 1991 March 2001 120
8
Duration # Ending # At Risk F(t) Survivor0 0 10 0 112 1 10 0.1 0.924 1 9 0.2 0.836 1 8 0.3 0.737 1 7 0.4 0.639 1 6 0.5 0.545 1 5 0.6 0.458 1 4 0.7 0.392 1 3 0.8 0.2106 1 2 0.9 0.1120 1 1 1 0
9
Figure 2: Estimated Survivor Function for Post-War Expansions
0
0.2
0.4
0.6
0.8
1
1.2
0 20 40 60 80 100 120 140
Duration in Months
Su
rviv
or
Fu
nct
ion
10
Figure 3: Exponential Trendline Fitted to Estimated Survivor Function
y = 1.1972e-0.0217x
R2 = 0.9533
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120
Duration in Months
Su
rviv
or
Fu
nct
ion
tetS )(
11
Figure 4: Constrained Expontial trendline, Fitted to Estimated Survivor Function
y = e-0.019x
R2 = 0.9313
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120
Duration in Months
Su
rviv
or
Fu
nc
tio
n
Exponential Distribution
• Hazard rate: ratio of density function to the survivor function:
• h(t) = f(t)/S(t)
• measure of probability of failure at time t given that you have survived that long
• for the exponential it is a constant:
• h(t) = )exp(/)exp( tt
13
Duration # Ending # At Risk Inter. Haz.0 0 10 012 1 10 0.100024 1 9 0.111136 1 8 0.125037 1 7 0.142939 1 6 0.166745 1 5 0.200058 1 4 0.250092 1 3 0.3333106 1 2 0.5000120 1 1 1.0000
Interval hazard rate=#ending/#at risk
15
Duration # Ending # At Risk Inter. Haz.Cum. Hazard0 0 10 0 012 1 10 0.1000 0.100024 1 9 0.1111 0.211136 1 8 0.1250 0.336137 1 7 0.1429 0.479039 1 6 0.1667 0.645645 1 5 0.2000 0.845658 1 4 0.2500 1.095692 1 3 0.3333 1.4290106 1 2 0.5000 1.929120 1 1 1.0000 2.929
16
Cumulative Hazard Function: Postwar Expansions
y = 0.0223x - 0.2422R2 = 0.9288
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
0 20 40 60 80 100 120 140
Duration in Months
Cu
mu
lati
ve H
azar
d
17
Cumulative Hazard Function, Postwar Expansions
y = 0.0192xR2 = 0.9015
0
0.5
1
1.5
2
2.5
3
3.5
0 20 40 60 80 100 120 140
Duration in Months
Cu
mu
lati
ve H
azrd
Weibull Distribution• F(t) = 1 - exp[
• S(t) =
• ln S(t) = - (t/
• h(t) = f(t)/S(t)
• f(t) = dF(t)/dt = - exp[-(t/t/
• h(t) = (t/
• if h(t) = constant
• if h(t) is increasing function
• if h(t) is a decreasing function
])/( t
])/(exp[ t
20
Cumulative Hazard Function, Post-War Expansions
0.1
1
10
10 100 1000
Duration in Months
Cu
mu
lati
ve H
azar
d
21
-3
-2
-1
0
1
2
2.0 2.5 3.0 3.5 4.0 4.5 5.0
LNDUR
LNC
UM
HA
Z
Log-Log Plot of Cumulativ e Hazard Function Vs. Duration, Post-War Expansions
22
Dependent Variable: LNCUMHAZMethod: Least Squares
Sample: 2 11Included observations: 10
Variable Coefficient Std. Error t-Statistic Prob.
LNDUR 1.436662 0.103558 13.87303 0.0000C -5.920740 0.403303 -14.68061 0.0000
R-squared 0.960092 Mean dependent var -0.409591 Adjusted R-squared 0.955103 S.D. dependent var 1.038386 S.E. of regression 0.220022 Akaike info criterion -0.013326 Sum squared resid 0.387276 Schwarz criterion 0.047191 Log likelihood 2.066628 F-statistic 192.4609 Durbin-Watson stat 1.210695 Prob(F-statistic) 0.000001
23
Is Beta More Than One?
• H0: beta=1
• HAA: beta>1, and hazard rate is increasing : beta>1, and hazard rate is increasing
with time, i.e. expansions are more likely to with time, i.e. expansions are more likely to end the longer they lastend the longer they last
• t = ( 1.437 - 1)/0.104 = 4.20t = ( 1.437 - 1)/0.104 = 4.20
24
Conclude
• Economic expansions are at increasing risk the longer they last
• the business cycle is not dead
• so much for the new economics
• maybe Karl Marx was right, capitalism is an inherently unstable system, subject to cycles
25
Source:Wayne Nelson, Applied Life data Analysis(1982) John WileyDiesel Generators, hours to fan failure, (+ indicates running time, i.e. still running whenlast observed)
Hours # Ending # At Risk Interval Interval Hazard Rate Cumulative Hazard Rate450
460+11501150
1560+1600
1660+1850+1850+1850+1850+1850+2030+2030+2030+
207020702080
2200+
Lab Seven
26
Source:Wayne Nelson, Applied Life data Analysis(1982) John WileyDiesel Generators, hours to fan failure, (+ indicates running time, i.e. still running when last observed)
Hours # Ending # At Risk Interval Interval Hazard Rate Cumulative Hazard Rate450 1 70 450
460+ 681150 2 68 7001150
1560+ 651600 1 65 450
1660+ 631850+ 621850+ 611850+ 601850+ 591850+ 582030+ 572030+ 562030+ 55
2070 2 55 47020702080 1 53 10
27
Source:Wayne Nelson, Applied Life data Analysis(1982) John WileyDiesel Generators, hours to fan failure, (+ indicates running time, i.e. still running whenlast observed)
Hours # Ending # At Risk Interval Interval Hazard Rate Cumulative Hazard Rate450 1 70 450 0.0143 0.0143
460+ 681150 2 68 700 0.0294 0.04371150
1560+ 651600 1 65 450 0.0154 0.0591
1660+ 631850+ 621850+ 611850+ 601850+ 591850+ 582030+ 572030+ 562030+ 55
2070 2 55 470 0.0364 0.095520702080 1 53 10 0.0189 0.1143
2200+ 51
28
Cumulative Hazard Rate for Fan Failure
y = 4E-05x + 0.0089
R2 = 0.9816
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Duration in Hours
Cu
mu
lati
ve H
azar
d
29
Cumulative Hazard Rate For Fan Failure
y = 4E-05x + 0.0089
R2 = 0.9816
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 2000 4000 6000 8000 10000
Duration in Hours
Cu
mu
lati
ve
Ha
zard
30
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.990735441R Square 0.981556714Adjusted R Square 0.979251304Standard Error 0.014359467Observations 10
ANOVAdf SS MS F Significance F
Regression 1 0.087789735 0.08779 425.7622 3.19E-08Residual 8 0.001649554 0.000206Total 9 0.089439289
Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%Intercept 0.008850453 0.007759834 1.140547 0.287051 -0.00904 0.026745 -0.00904 0.026745X Variable 1 3.89324E-05 1.88681E-06 20.63401 3.19E-08 3.46E-05 4.33E-05 3.46E-05 4.33E-05
Lambda= 3.89 x 10-5, mean=25,707 hrs
Regress Cumulative Hazard on Duration
32
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.98807309R Square 0.97628844Adjusted R Square 0.97332449Standard Error 0.15117465Observations 10
ANOVAdf SS MS F Significance F
Regression 1 7.527762455 7.527762 329.3881 8.73E-08Residual 8 0.182830209 0.022854Total 9 7.710592664
Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%Intercept -10.5688829 0.45937382 -23.0072 1.35E-08 -11.6282 -9.50956 -11.6282 -9.50956X Variable 1 1.06005237 0.058408137 18.14905 8.73E-08 0.925363 1.194742 0.925363 1.194742
034.1058.0/)106.1(,1:,1:0 tHH A
Accept that the hazard rate is constant
34
Lab Six
• Lottery
• what effect do the zeros have on the regression of % income spent on the lottery versus explanatory variables such as household income?
35
LOTTERY AGE CHILDREN EDUCATION INCOME 5.000000 50.00000 2.000000 15.00000 41.00000 7.000000 26.00000 0.000000 10.00000 22.00000 0.000000 40.00000 3.000000 13.00000 24.00000 10.00000 46.00000 2.000000 9.000000 20.00000 5.000000 40.00000 3.000000 14.00000 32.00000 5.000000 39.00000 2.000000 15.00000 42.00000 3.000000 36.00000 3.000000 8.000000 18.00000 0.000000 44.00000 1.000000 16.00000 47.00000 0.000000 47.00000 4.000000 20.00000 85.00000 6.000000 52.00000 1.000000 10.00000 23.00000 0.000000 51.00000 2.000000 18.00000 61.00000 0.000000 41.00000 2.000000 17.00000 70.00000 12.00000 42.00000 2.000000 9.000000 22.00000 7.000000 53.00000 1.000000 12.00000 27.00000 11.00000 72.00000 1.000000 9.000000 25.00000
Data
37
Percent of Income Spent on the Lottey
-6
-4
-2
0
2
4
6
8
10
12
14
0 20 40 60 80 100
Household Income
Pe
rce
nt
on
Lo
tte
ry
lottery
tobit fitted
Linear (lottery)
OLS slope biased toward zero
38
Effect of the Zeros on Regression
• Bias the OLS regression slope towards zero
• can’t throw away the zeros, this is the mistake the NASA engineers made with Challenger. They threw away the launches where no o-rings had failed.
• Start with a simpler model: does the household play the lottery or not?
40
Dependent Variable: LOTTERYMethod: ML - Censored Normal (TOBIT)Sample: 1 100Included observations: 100Left censoring (value) at zeroConvergence achieved after 5 iterationsCovariance matrix computed using second derivatives
Coefficient Std. Error z-Statistic Prob. C 12.21673 1.080618 11.30531 0.0000INCOME -0.229460 0.033817 -6.785390 0.0000
Error DistributionSCALE:C(3) 3.635246 0.307204 11.83332 0.0000R-squared 0.365088 Mean dependent var 5.390000 Adjusted R-squared 0.351997 S.D. dependent var 3.786993 S.E. of regression 3.048478Akaike info criterion 4.621338 Sum squared resid 901.4421Schwarz criterion 4.699493 Log likelihood-228.0669 Hannan-Quinn criter. 4.652969 Avg. log likelihood -2.280669Left censored obs 23 Right censored obs 0Uncensored obs 77 Total obs 100
Tobit:Eviews
41
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.5890982R Square 0.3470367Adjusted R Square 0.3403738Standard Error 3.0756959Observations 100
ANOVAdf SS MS F Significance F
Regression 1 492.7192807 492.7193 52.085012 1.13961E-10Residual 98 927.0707193 9.459905Total 99 1419.79
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Lower 95.0%Upper 95.0%Intercept 10.076761 0.718559336 14.02356 3.6661E-25 8.65080379 11.50272 8.650804 11.50272X Variable 1 -0.142368 0.0197268 -7.21699 1.1396E-10 -0.1815154 -0.10322 -0.18152 -0.10322
OLS Excel
42
Bernoulli Variable: Bern
• Bern = 0*(lottery=0) + 1*(lottery>0)
• Linear Probability Model: dummy dependent variable
• Bern(i) = c + a*income + b*age +d*children + f*education + e(i)
44
Bern age children education income
0 40.49 1.78 15.57 47.565
1 44.19 1.78 11.94 28.545
Averages for Players and Non-Players
45
0.0
0.2
0.4
0.6
0.8
1.0
0 20 40 60 80 100
INCOME
BE
RN
Linear Probability Model: Play Lottery, Yes or No
46
Dependent Variable: BERN Method: Least SquaresSample: 1 100Included observations: 100
Variable Coefficient Std. Error t-Statistic Prob.
C 1.226152 0.085229 14.38653 0.0000INCOME -0.013856 0.002340 -5.921989 0.0000
R-squared 0.263545 Mean dependent var 0.770000 Adjusted R-squared 0.256030 S.D. dependent var 0.422953 S.E. of regression 0.364812 Akaike info criterion 0.840929 Sum squared resid 13.04261 Schwarz criterion 0.893032 Log likelihood -40.04644 F-statistic 35.06996 Durbin-Watson stat 2.043370 Prob(F-statistic) 0.000000
47
Linear Probability Model
• Note: in the linear probability model, income is determining the probability of playing the lottery not the % of income spent on the lottery, so the interpretation of the slope is different.
48
-0.5
0.0
0.5
1.0
1.5
0 20 40 60 80 100
INCOME
BE
RN
HA
T
Linear Probabilty Model of Playing the Lottery Vs. Income
50
Dependent Variable: BERNMethod: ML - Binary ProbitSample: 1 100Included observations: 100Convergence achieved after 4 iterationsCovariance matrix computed using second derivatives
Variable Coefficient Std. Error z-Statistic Prob. C 2.373242 0.409671 5.793039 0.0000INCOME -0.046799 0.011017 -4.248030 0.0000
Mean dependent var 0.770000 S.D. dependent var 0.422953 S.E. of regression 0.362314 Akaike info criterion 0.876836 Sum squared resid 12.86458 Schwarz criterion 0.928939 Log likelihood-41.84178 Hannan-Quinn criter. 0.897923 Restr. log likelihood -53.92763 Avg. log likelihood -0.418418 LR statistic (1 df) 24.17171McFadden R-squared 0.224112 Probability(LR stat) 8.81E-07Obs with Dep=0 23 Total obs 100Obs with Dep=1 77
51
0.0
0.2
0.4
0.6
0.8
1.0
0 20 40 60 80 100
INCOME
PR
OB
ITH
AT
Estimated Probit Probability Model of Playing the Lottery
Part IV. Poisson Approximation to Binomial
• Conditions:
• f(x) = {exp[-] x }/x!
• Assumptions:– the number of events occurring in non-
overlapping intervals are independent– the probability of a single event occurring in a
small interval is approximately proportional to the interval
– the probability of more than one event in an interval is negligible
50,1)1(,0 npp
53
Example
• Ten % of tools produced in a manufacturing process are defective. What is the probability of finding exactly two defectives in a random sample of 10?
• Binomial: p(k=2) = 10!/(8!2!)(0.1)2(0.9)8 = 0.194
• Poisson , where the mean of the Poisson, equals n*p = 0.1 p(k=2) = {exp[-1] 12 }/2! = 0.184