Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School...

24
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 1 ttp://xkcd.com/893/

Transcript of Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School...

Page 1: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

Unit 5c: Adding Predictors to the Discrete Time Hazard Model

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 1

http://xkcd.com/893/

Page 2: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

• Reviewing Life Table Analysis, Hazard Functions, Survival Functions• Building the Discrete Time Hazard Model• Comparing Nested Models

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 2

Multiple RegressionAnalysis (MRA)

Multiple RegressionAnalysis (MRA) iiii XXY 22110

Do your residuals meet the required assumptions?

Test for residual

normality

Use influence statistics to

detect atypical datapoints

If your residuals are not independent,

replace OLS by GLS regression analysis

Use Individual

growth modeling

Specify a Multi-level

Model

If time is a predictor, you need discrete-

time survival analysis…

If your outcome is categorical, you need to

use…

Binomial logistic

regression analysis

(dichotomous outcome)

Multinomial logistic

regression analysis

(polytomous outcome)

If you have more predictors than you

can deal with,

Create taxonomies of fitted models and compare

them.

Form composites of the indicators of any common

construct.

Conduct a Principal Components Analysis

Use Cluster Analysis

Use non-linear regression analysis.

Transform the outcome or predictor

If your outcome vs. predictor relationship

is non-linear,

Use Factor Analysis:EFA or CFA?

Course Roadmap: Unit 5c

Today’s Topic Area

Page 3: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 3

New data example … data described in FIRSTSEX_info.html …. New data example … data described in FIRSTSEX_info.html ….

822 person-period records.Sample size

Singer & Willett, 2003, Chapter 11.More Info

Capaldi, D. M., Crosby, L., & Stoolmiller, M. (1996). Predicting The Timing Of First Sexual Intercourse For At-Risk Adolescent Males. Child Development, 67, 344-359.

Source

Person-level dataset that records he high-school grade (7th – 12th) in which at-risk adolescent boys reported experiencing heterosexual sex for first time, with data on:

1. Whether the boy had suffered a parental transition during early childhood (eg., a parental divorce and/or a parental death prior to 7th grade).

2. The parents’ level of antisocial behavior during the boy’s early childhood.

Overview

FIRSTSEX.txtDataset

Research Questions1. Whether, and if so in which grade, at-risk adolescent

boys report first experiencing heterosexual sex?

2. How the risk of reported first heterosexual sex depends on the boy’s experiences with parental death and divorce during early childhood?

New Data Example

In DTSA terms, first sex is death, and survival is virginity.In DTSA terms, first sex is death, and survival is virginity.

Page 4: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 4

Getting the Data in Shape

20. 22 12 0 Yes -.6496482 Intercourse 19. 21 10 0 Yes .0398107 Intercourse 18. 20 12 1 No -.1892383 No Intercourse 17. 19 7 0 No -.738722 Intercourse 16. 18 12 0 Yes -.8355232 Intercourse 15. 17 10 0 Yes 1.723024 Intercourse 14. 16 12 0 Yes 1.201357 Intercourse 13. 15 9 0 Yes -1.421212 Intercourse 12. 14 11 0 Yes .6127937 Intercourse 11. 13 12 1 No -.9002126 No Intercourse 10. 12 11 0 Yes -.745946 Intercourse 9. 11 12 1 Yes .8017636 No Intercourse 8. 10 11 0 No .4535947 Intercourse 7. 9 12 1 No -.8685001 No Intercourse 6. 7 9 0 Yes -.2428857 Intercourse 5. 6 11 0 No -.6356313 Intercourse 4. 5 12 0 Yes .9741806 Intercourse 3. 3 12 1 No -1.40498 No Intercourse 2. 2 12 1 Yes -.5454916 No Intercourse 1. 1 9 0 No 1.978867 Intercourse ID GRADE CENSOR PT PAS EVENT

Adolescent Boy 1 first reported sexual intercourse in Grade 9, had no early parenting transitions and a high level of parent antisocial behavior during the boy’s childhood.Adolescent Boy 2 has a censored record (never had intercourse through Grade 12), had early parenting transitions, and an average level of parent antisocial behavior.

Adolescent Boy 1 first reported sexual intercourse in Grade 9, had no early parenting transitions and a high level of parent antisocial behavior during the boy’s childhood.Adolescent Boy 2 has a censored record (never had intercourse through Grade 12), had early parenting transitions, and an average level of parent antisocial behavior.

Page 5: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 5

A Histogram of GRADE

020

4060

Nu

mbe

r re

port

ing

firs

t sex

7 8 9 10 11 12Boy's Grade

This tells us how many adolescent boys report first sex in any given grade. In red, we have the number of boys whose records are censored, who never report having sex through Grade 12.This tells us how many adolescent boys report first sex in any given grade. In red, we have the number of boys whose records are censored, who never report having sex through Grade 12.

Why doesn’t this give us a good sense of how Hazard Probabilities differ over time?Why doesn’t this give us a good sense of how Hazard Probabilities differ over time?

Page 6: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 6

The Hazard Function

12 13 80 0.7000 0.0342 0.3250 0.0637 0.2123 0.4613 11 12 105 0.5556 0.0370 0.2381 0.0476 0.1541 0.3401 10 11 134 0.4167 0.0367 0.2164 0.0402 0.1449 0.3020 9 10 158 0.2556 0.0325 0.1519 0.0310 0.0973 0.2184 8 9 165 0.1222 0.0244 0.0424 0.0160 0.0171 0.0791 7 8 180 0.0833 0.0206 0.0833 0.0215 0.0466 0.1305 Interval Total Failure Error Hazard Error [95% Conf. Int.] Beg. Cum. Std. Std.

. ltable GRADE EVENT, hazard noadjust

0.1

.2.3

Ha

zard

Pro

babi

lity

6 7 8 9 10 11 12Boy's Grade

The sample probability of reporting the loss of your virginity in Grade 12 (conditional on never doing so before entering the risk set) is 32.5%.

The sample probability of reporting the loss of your virginity in Grade 12 (conditional on never doing so before entering the risk set) is 32.5%.

Page 7: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 7

The Survival Function

12 13 80 26 54 0.3000 0.0342 0.2348 0.3678 11 12 105 25 0 0.4444 0.0370 0.3709 0.5153 10 11 134 29 0 0.5833 0.0367 0.5078 0.6514 9 10 158 24 0 0.7444 0.0325 0.6741 0.8019 8 9 165 7 0 0.8778 0.0244 0.8203 0.9178 7 8 180 15 0 0.9167 0.0206 0.8656 0.9489 Interval Total Deaths Lost Survival Error [95% Conf. Int.] Beg. Std.

. ltable GRADE EVENT, survival noadjust

.2.4

.6.8

1S

urvi

val P

rob

abili

ty

6 7 8 9 10 11 12Boy's Grade

Remember that That is, by Time 1, only survive (remain virgins).And for .That is, by any subsequent time , only of the surviving percentage from the previous time point, , survive (remain virgins).The sample probability of adolescent boys maintaining reported virginity past Grade 12 is 30%.

Remember that That is, by Time 1, only survive (remain virgins).And for .That is, by any subsequent time , only of the surviving percentage from the previous time point, , survive (remain virgins).The sample probability of adolescent boys maintaining reported virginity past Grade 12 is 30%.

How to find the median survival time…

Page 8: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

0.1

.2.3

.4.5

Co

nditi

ona

l Haz

ard

Pro

bab

ility

6 7 8 9 10 11 12Boy's Grade

No Parental Transition Parental Transition© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 8

Conditional Hazard Probabilities and Functions

12 13 38 0.8148 0.0374 0.4737 0.1116 0.2807 0.7163 11 12 53 0.6481 0.0460 0.2830 0.0731 0.1584 0.4432 10 11 74 0.5093 0.0481 0.2838 0.0619 0.1757 0.4174 9 10 90 0.3148 0.0447 0.1778 0.0444 0.1016 0.2749 8 9 95 0.1667 0.0359 0.0526 0.0235 0.0171 0.1078 7 8 108 0.1204 0.0313 0.1204 0.0334 0.0641 0.1941PT 1 12 13 42 0.5278 0.0588 0.1905 0.0673 0.0822 0.3434 11 12 52 0.4167 0.0581 0.1923 0.0608 0.0922 0.3286 10 11 60 0.2778 0.0528 0.1333 0.0471 0.0576 0.2404 9 10 68 0.1667 0.0439 0.1176 0.0416 0.0508 0.2121 8 9 70 0.0556 0.0270 0.0286 0.0202 0.0035 0.0796 7 8 72 0.0278 0.0194 0.0278 0.0196 0.0034 0.0774PT 0 Interval Total Failure Error Hazard Error [95% Conf. Int.] Beg. Cum. Std. Std.

. ltable GRADE EVENT, hazard noadjust by(PT) What are the hazard probabilities conditional on whether the boy had an early parental transition?

What are the hazard probabilities conditional on whether the boy had an early parental transition?

The sample probability of reporting the loss of your virginity in Grade 12 (conditional on being a reported virgin entering the risk set) is 47% if you had an early parental transition and 19% if you did not.

The sample probability of reporting the loss of your virginity in Grade 12 (conditional on being a reported virgin entering the risk set) is 47% if you had an early parental transition and 19% if you did not.

Page 9: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 9

Conditional Survival Probabilities and FunctionsWhat are the survival probabilities conditional on whether the boy had an early parental transition?

What are the survival probabilities conditional on whether the boy had an early parental transition?

The sample probability of maintaining one’s reported virginity past Grade 12 is 19% if you had an early parental transition and 47% if you did not.

The sample probability of maintaining one’s reported virginity past Grade 12 is 19% if you had an early parental transition and 47% if you did not.

0.2

5.5

.75

1C

ond

itio

nal S

urv

iva

l Pro

babi

lity

6 7 8 9 10 11 12Boy's Grade

No Parental Transition Parental Transition

12 13 38 18 20 0.1852 0.0374 0.1186 0.2635 11 12 53 15 0 0.3519 0.0460 0.2633 0.4415 10 11 74 21 0 0.4907 0.0481 0.3936 0.5807 9 10 90 16 0 0.6852 0.0447 0.5885 0.7637 8 9 95 5 0 0.8333 0.0359 0.7486 0.8915 7 8 108 13 0 0.8796 0.0313 0.8017 0.9283Yes 12 13 42 8 34 0.4722 0.0588 0.3538 0.5817 11 12 52 10 0 0.5833 0.0581 0.4610 0.6871 10 11 60 8 0 0.7222 0.0528 0.6033 0.8110 9 10 68 8 0 0.8333 0.0439 0.7252 0.9017 8 9 70 2 0 0.9444 0.0270 0.8587 0.9788 7 8 72 2 0 0.9722 0.0194 0.8935 0.9930No Interval Total Deaths Lost Survival Error [95% Conf. Int.] Beg. Std.

. ltable GRADE EVENT, survival noadjust by(PT)

How to find the median survival timeconditional on PT!

Page 10: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

0.1

.2.3

.4.5

Co

nditi

ona

l Haz

ard

Pro

bab

ility

6 7 8 9 10 11 12Boy's Grade

No Parental Transition Parental Transition

-4-3

-2-1

0C

ond

itio

nal H

aza

rd L

ogit

6 7 8 9 10 11 12Boy's Grade

No Parental Transition Parental Transition

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 10

Conditional Hazard Logits as the Target of Modeling

Dichotomous outcomes are cumbersome to model directly.

We model probabilities after transforming them to logits: .

A change of scale. Stretches out extreme probabilities. Compresses central probabilities.

Dichotomous outcomes are cumbersome to model directly.

We model probabilities after transforming them to logits: .

A change of scale. Stretches out extreme probabilities. Compresses central probabilities.

Page 11: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 11

From Person-Level to Person-Period Hazard Probabilities

100.00 100.00 100.00 100.00 100.00 100.00 100.00 Total 180 165 158 134 105 80 822 8.33 4.24 15.19 21.64 23.81 32.50 15.33 Intercourse 15 7 24 29 25 26 126 91.67 95.76 84.81 78.36 76.19 67.50 84.67 No Intercourse 165 158 134 105 80 54 696 first time? 7 8 9 10 11 12 Total sex for the Boy's Grade heterosexual Did boy have

. tabulate EVENT GRADE, column

12 13 80 0.7000 0.0342 0.3250 0.0637 0.2123 0.4613 11 12 105 0.5556 0.0370 0.2381 0.0476 0.1541 0.3401 10 11 134 0.4167 0.0367 0.2164 0.0402 0.1449 0.3020 9 10 158 0.2556 0.0325 0.1519 0.0310 0.0973 0.2184 8 9 165 0.1222 0.0244 0.0424 0.0160 0.0171 0.0791 7 8 180 0.0833 0.0206 0.0833 0.0215 0.0466 0.1305 Interval Total Failure Error Hazard Error [95% Conf. Int.] Beg. Cum. Std. Std.

. ltable GRADE EVENT, hazard noadjust

0.1

.2.3

Ha

zard

Pro

babi

lity

6 7 8 9 10 11 12Boy's Grade

Remember how to generate hazard probabilities in a person-period dataset, by tabulate or by egen.Remember how to generate hazard probabilities in a person-period dataset, by tabulate or by egen.

Of course, we don’t fit probabilities directly, we fit logits: , -2.39, -3.12, -1.72, -1.29, -1.16, -.73.

Of course, we don’t fit probabilities directly, we fit logits: , -2.39, -3.12, -1.72, -1.29, -1.16, -.73.

From before…From before…

Page 12: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

0.1

.2.3

Ha

zard

Pro

babi

lity

6 7 8 9 10 11 12Number of Years in Teaching

Sample Hazard Probability Fitted Hazard Probability© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 12

Fitting the Discrete Time Hazard Model

G12 -.7308875 .238705 -3.06 0.002 -1.198741 -.2630344 G11 -1.163151 .2291288 -5.08 0.000 -1.612235 -.7140666 G10 -1.286665 .2097774 -6.13 0.000 -1.697821 -.8755083 G9 -1.719786 .2216514 -7.76 0.000 -2.154215 -1.285357 G8 -3.116685 .3862464 -8.07 0.000 -3.873714 -2.359656 G7 -2.397895 .2696799 -8.89 0.000 -2.926458 -1.869332 EVENT Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -325.97769 Prob > chi2 = 0.0000 Wald chi2(6) = 277.14Logistic regression Number of obs = 822

. logit EVENT G7-G12, noconstant nolog

Of course, we don’t fit probabilities directly, we fit logits: , -2.39, -3.12, -1.72, -1.29, -1.16, -.73.

Of course, we don’t fit probabilities directly, we fit logits: , -2.39, -3.12, -1.72, -1.29, -1.16, -.73.

These z-tests test the null hypothesis that the logit is 0 (the probability is 50%) in the population. These z-tests test the null hypothesis that the logit is 0 (the probability is 50%) in the population.

The fitted logits/probabilities reproduce the sample logits/probabilities exactly and incorporates them into a statistical framework.

The fitted logits/probabilities reproduce the sample logits/probabilities exactly and incorporates them into a statistical framework.

Page 13: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 13

Now, Conditional Hazard Probabilities in a Person-Period Dataset

100.00 100.00 100.00 100.00 100.00 100.00 100.00 Total 108 95 90 74 53 38 458 12.04 5.26 17.78 28.38 28.30 47.37 19.21 Intercourse 13 5 16 21 15 18 88 87.96 94.74 82.22 71.62 71.70 52.63 80.79 No Intercourse 95 90 74 53 38 20 370 first time? 7 8 9 10 11 12 Total sex for the Boy's Grade heterosexual Did boy have

column percentage frequency Key

-> PT = Yes

100.00 100.00 100.00 100.00 100.00 100.00 100.00 Total 72 70 68 60 52 42 364 2.78 2.86 11.76 13.33 19.23 19.05 10.44 Intercourse 2 2 8 8 10 8 38 97.22 97.14 88.24 86.67 80.77 80.95 89.56 No Intercourse 70 68 60 52 42 34 326 first time? 7 8 9 10 11 12 Total sex for the Boy's Grade heterosexual Did boy have

column percentage frequency Key

-> PT = No

. bysort PT: tab EVENT GRADE, column

Remember that we don’t model probabilities directly. Instead, we model their logits.

Remember that we don’t model probabilities directly. Instead, we model their logits.

By adding the categorical predictor variable, PT, to the “by” statement, we create conditional hazard probabilities in a person-period dataset. (It’s straightforward for person-level data with the “ltable, by” approach).

By adding the categorical predictor variable, PT, to the “by” statement, we create conditional hazard probabilities in a person-period dataset. (It’s straightforward for person-level data with the “ltable, by” approach).

Page 14: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

-4-3

-2-1

0H

aza

rd L

ogi

ts

6 7 8 9 10 11 12Number of Years in Teaching

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 14

Fitting the Discrete Time Hazard Model with a Predictor

PT .8736184 .2174076 4.02 0.000 .4475075 1.299729 G12 -1.179057 .2715801 -4.34 0.000 -1.711344 -.6467698 G11 -1.654227 .2691058 -6.15 0.000 -2.181665 -1.12679 G10 -1.822599 .2584614 -7.05 0.000 -2.329174 -1.316024 G9 -2.281124 .2723919 -8.37 0.000 -2.815002 -1.747245 G8 -3.700124 .4205628 -8.80 0.000 -4.524412 -2.875836 G7 -2.994327 .3175088 -9.43 0.000 -3.616632 -2.372021 EVENT Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -317.33089 Prob > chi2 = 0.0000 Wald chi2(7) = 271.09Logistic regression Number of obs = 822

. logit EVENT G7-G12 PT, nolog noconstant

0.1

.2.3

.4.5

Ha

zard

Pro

babi

lity

6 7 8 9 10 11 12Number of Years in Teaching

The dotted lines show the fitted hazard probabilities, assuming a constant (in the logits) estimated main effect of .874 logits.

The fitted odds ratio of is interpretable as the fitted ratio of the odds of “parental transition” boys reporting first sex over the odds of “non-parental transition” boys reporting first sex, at any given grade.

The dotted lines show the fitted hazard probabilities, assuming a constant (in the logits) estimated main effect of .874 logits.

The fitted odds ratio of is interpretable as the fitted ratio of the odds of “parental transition” boys reporting first sex over the odds of “non-parental transition” boys reporting first sex, at any given grade.

As always, the interpretation of the main effect is easier to see on the logit scale than it is on the probability scale.

As always, the interpretation of the main effect is easier to see on the logit scale than it is on the probability scale.

A difference of .874 logits throughout.

An odds ratio of throughout.

A difference of .874 logits throughout.

An odds ratio of throughout.

Page 15: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 15

Building the Discrete Time Hazard Model with esttab

For each model, we store using eststo: , save the deviance (-2LL), save the predicted probabilities, and label the predicted probability variable.

For each model, we store using eststo: , save the deviance (-2LL), save the predicted probabilities, and label the predicted probability variable.

Instead of adding dummy variables for each grade, we plot a model that is linear in the logits.Instead of adding dummy variables for each grade, we plot a model that is linear in the logits.

And then add the question predictor…And then add the question predictor…

We can see if the above model is more parsimonious than a model that uses all the grade dummies without sacrificing prediction.

We can see if the above model is more parsimonious than a model that uses all the grade dummies without sacrificing prediction.

Page 16: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 16

Building the Discrete Time Hazard Model with esttab

* p<0.05, ** p<0.01, *** p<0.001

Standard errors in parentheses r2_p -8.88e-16 0.0620 0.0868 0.0742 0.0988 neg2ll 704.2 660.6 643.1 652.0 634.7 df_m 0 1 2 5 6 chi2 -6.82e-13 43.65 61.10 52.28 69.57 (0.0968) (0.618) (0.670) (0.270) (0.318) _cons -1.709*** -5.471*** -6.305*** -2.398*** -2.994***

(0.360) (0.367) G12 1.667*** 1.815***

(0.354) (0.359) G11 1.235*** 1.340***

(0.342) (0.345) G10 1.111** 1.172***

(0.349) (0.352) G9 0.678 0.713*

(0.471) (0.473) G8 -0.719 -0.706

(0.217) (0.217) PT 0.875*** 0.874***

(0.0624) (0.0641) GRADE 0.399*** 0.430*** EVENT Model 1 Model 2 Model 3 Model 4 Model 5 Fitting discrete time hazard models for the probability of reporting first sex (n=822)

Two adolescent boys that differ by one grade level have an estimated difference in log-odds of reporting first sex of .4 logits.

Two adolescent boys that differ by one grade level have an estimated difference in log-odds of reporting first sex of .4 logits.

The fitted difference in log-odds between boys with and without parental transitions is .87, if GRADE could be held constant.

The fitted difference in log-odds between boys with and without parental transitions is .87, if GRADE could be held constant.

Page 17: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 17

The Likelihood Ratio Chi-Square for DTSA

* p<0.05, ** p<0.01, *** p<0.001

Standard errors in parentheses r2_p -8.88e-16 0.0620 0.0868 0.0742 0.0988 neg2ll 704.2 660.6 643.1 652.0 634.7 df_m 0 1 2 5 6 chi2 -6.82e-13 43.65 61.10 52.28 69.57 (0.0968) (0.618) (0.670) (0.270) (0.318) _cons -1.709*** -5.471*** -6.305*** -2.398*** -2.994***

(0.360) (0.367) G12 1.667*** 1.815***

(0.354) (0.359) G11 1.235*** 1.340***

(0.342) (0.345) G10 1.111** 1.172***

(0.349) (0.352) G9 0.678 0.713*

(0.471) (0.473) G8 -0.719 -0.706

(0.217) (0.217) PT 0.875*** 0.874***

(0.0624) (0.0641) GRADE 0.399*** 0.430*** EVENT Model 1 Model 2 Model 3 Model 4 Model 5 Fitting discrete time hazard models for the probability of reporting first sex (n=822)

Difference in deviance (-2LL) is 660.6-652.0=8.6. Difference in the degrees of freedom for the statistic is 4.. display invchi2tail(4,.05)9.49We retain the null hypothesis of no difference between the fit of the models in the population, thus we opt to keep the more parsimonious model.

Difference in deviance (-2LL) is 660.6-652.0=8.6. Difference in the degrees of freedom for the statistic is 4.. display invchi2tail(4,.05)9.49We retain the null hypothesis of no difference between the fit of the models in the population, thus we opt to keep the more parsimonious model.

Another way to do this, which we can apply to the contrast between Models 3 and 5 (Do dummies give better fit in the population, after accounting for PT?), is directly in Stata:

Another way to do this, which we can apply to the contrast between Models 3 and 5 (Do dummies give better fit in the population, after accounting for PT?), is directly in Stata:

Page 18: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

-3-2

.5-2

-1.5

-1-.

5F

itte

d H

azar

d L

ogi

ts

6 7 8 9 10 11 12Grade

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 18

Model 1: Constant Only

_cons -1.709068 .0968158 -17.65 0.000 -1.898823 -1.519312 EVENT Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -352.11572 Pseudo R2 = 0.0000 Prob > chi2 = . LR chi2(0) = 0.00Logistic regression Number of obs = 822

. logit EVENT, nolog

0.1

.2.3

Fitt

ed

Haz

ard

Pro

bab

ilitie

s

6 7 8 9 10 11 12Grade

Page 19: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 19

Model 2: Linear in Logits on GRADE and PT0

.1.2

.3.4

Fitt

ed

Haz

ard

Pro

bab

ilitie

s

6 7 8 9 10 11 12Grade

-3-2

.5-2

-1.5

-1-.

5F

itte

d H

azar

d L

ogi

ts

6 7 8 9 10 11 12Grade

_cons -5.471358 .6175907 -8.86 0.000 -6.681813 -4.260902 GRADE .3989694 .0623668 6.40 0.000 .2767328 .5212061 EVENT Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -330.29299 Pseudo R2 = 0.0620 Prob > chi2 = 0.0000 LR chi2(1) = 43.65Logistic regression Number of obs = 822

. logit EVENT GRADE, nolog

Page 20: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

0.1

.2.3

.4.5

Fitt

ed

Haz

ard

Pro

bab

ilitie

s

6 7 8 9 10 11 12Grade

-4-3

-2-1

0F

itte

d H

azar

d L

ogi

ts

6 7 8 9 10 11 12Grade

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 20

Model 3: Linear in Logits on GRADE and PT

_cons -6.305339 .6703426 -9.41 0.000 -7.619186 -4.991492 PT .8754379 .2169584 4.04 0.000 .4502072 1.300669 GRADE .4300272 .0640717 6.71 0.000 .304449 .5556053 EVENT Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -321.56389 Pseudo R2 = 0.0868 Prob > chi2 = 0.0000 LR chi2(2) = 61.10Logistic regression Number of obs = 822

. logit EVENT GRADE PT, nolog

Parental Transition

No Parental Transition

Parental Transition

No Parental Transition

Page 21: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

-3-2

.5-2

-1.5

-1-.

5F

itte

d H

azar

d L

ogi

ts

6 7 8 9 10 11 12Grade

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 21

Model 4: Dummies for GRADE

_cons -2.397895 .2696799 -8.89 0.000 -2.926458 -1.869332 G12 1.667008 .360149 4.63 0.000 .9611286 2.372887 G11 1.234744 .3538747 3.49 0.000 .5411629 1.928326 G10 1.111231 .3416633 3.25 0.001 .4415829 1.780879 G9 .6781093 .3490797 1.94 0.052 -.0060743 1.362293 G8 -.7187884 .4710768 -1.53 0.127 -1.642082 .2045052 EVENT Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -325.97769 Pseudo R2 = 0.0742 Prob > chi2 = 0.0000 LR chi2(5) = 52.28Logistic regression Number of obs = 822

. logit EVENT G8-G12, nolog

0.1

.2.3

Fitt

ed

Haz

ard

Pro

bab

ilitie

s

6 7 8 9 10 11 12Grade

Page 22: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

-4-3

-2-1

0F

itte

d H

azar

d L

ogi

ts

6 7 8 9 10 11 12Grade

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 22

Model 5: Dummies for GRADE, Controlling for PT

_cons -2.994327 .3175088 -9.43 0.000 -3.616632 -2.372021 PT .8736184 .2174075 4.02 0.000 .4475075 1.299729 G12 1.81527 .3674037 4.94 0.000 1.095172 2.535368 G11 1.340099 .3587933 3.74 0.000 .6368774 2.043321 G10 1.171728 .345233 3.39 0.001 .4950838 1.848372 G9 .7132031 .3518598 2.03 0.043 .0235706 1.402836 G8 -.7057938 .4729088 -1.49 0.136 -1.632678 .2210905 EVENT Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -317.33089 Pseudo R2 = 0.0988 Prob > chi2 = 0.0000 LR chi2(6) = 69.57Logistic regression Number of obs = 822

. logit EVENT G8-G12 PT, nolog

0.1

.2.3

.4.5

Fitt

ed

Haz

ard

Pro

bab

ilitie

s

6 7 8 9 10 11 12Grade

Parental Transition

No Parental Transition

Parental Transition

No Parental Transition

Page 23: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 23

Revisiting the Likelihood Ratio Chi-Square, Graphically

* p<0.05, ** p<0.01, *** p<0.001

Standard errors in parentheses r2_p -8.88e-16 0.0620 0.0868 0.0742 0.0988 neg2ll 704.2 660.6 643.1 652.0 634.7 df_m 0 1 2 5 6 chi2 -6.82e-13 43.65 61.10 52.28 69.57 (0.0968) (0.618) (0.670) (0.270) (0.318) _cons -1.709*** -5.471*** -6.305*** -2.398*** -2.994***

(0.360) (0.367) G12 1.667*** 1.815***

(0.354) (0.359) G11 1.235*** 1.340***

(0.342) (0.345) G10 1.111** 1.172***

(0.349) (0.352) G9 0.678 0.713*

(0.471) (0.473) G8 -0.719 -0.706

(0.217) (0.217) PT 0.875*** 0.874***

(0.0624) (0.0641) GRADE 0.399*** 0.430*** EVENT Model 1 Model 2 Model 3 Model 4 Model 5 Fitting discrete time hazard models for the probability of reporting first sex (n=822)Difference in deviance (-2LL) is

660.6-652.0=8.6. Difference in the degrees of freedom for the statistic is 4.. display invchi2tail(4,.05)9.49We retain the null hypothesis of no difference between the fit of the models in the population, thus we opt to keep the more parsimonious model.

Difference in deviance (-2LL) is 660.6-652.0=8.6. Difference in the degrees of freedom for the statistic is 4.. display invchi2tail(4,.05)9.49We retain the null hypothesis of no difference between the fit of the models in the population, thus we opt to keep the more parsimonious model.

Another way to do this, which we can apply to the contrast between Models 3 and 5 (Do dummies give better fit in the population, after accounting for PT?), is directly in Stata:

Another way to do this, which we can apply to the contrast between Models 3 and 5 (Do dummies give better fit in the population, after accounting for PT?), is directly in Stata:

-3-2

.5-2

-1.5

-1-.

5F

itte

d H

azar

d L

ogi

ts

6 7 8 9 10 11 12Grade

-3-2

.5-2

-1.5

-1-.

5F

itte

d H

azar

d L

ogi

ts

6 7 8 9 10 11 12Grade

Page 24: Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1

0.2

5.5

.75

1F

itte

d/S

am

ple

Su

rviv

al P

roba

bilit

y

6 7 8 9 10 11 12Grade

0.1

.2.3

.4.5

Fitt

ed

Haz

ard

Pro

bab

ilitie

s

6 7 8 9 10 11 12Grade

-4-3

-2-1

0F

itte

d H

azar

d L

ogi

ts

6 7 8 9 10 11 12Grade

© Andrew Ho, Harvard Graduate School of Education Unit 5c– Slide 24

Fitted Survival Functions for Model 3

Parental Transition

No Parental Transition

Parental Transition

No Parental Transition

0.2

5.5

.75

1F

itte

d/S

am

ple

Su

rviv

al P

roba

bilit

y

6 7 8 9 10 11 12Grade

12. 12 Yes .4736842 .4330114 .1851852 .1911076 11. 11 Yes .2830189 .3319004 .3518519 .3370572 10. 10 Yes .2837838 .2442295 .4907407 .5045014 9. 9 Yes .1777778 .1736959 .6851852 .6675325 8. 8 Yes .0526316 .1202903 .8333333 .8078533 7. 7 Yes .1203704 .0816819 .8796296 .9183181 6. 12 No .1904762 .2414012 .4722222 .4646305 5. 11 No .1923077 .1714991 .5833333 .6124851 4. 10 No .1333333 .1186719 .7222222 .739269 3. 9 No .1176471 .0805354 .8333333 .8388124 2. 8 No .0285714 .0539049 .9444444 .9122835 1. 7 No .0277778 .035738 .9722222 .964262 GRADE PT HAZBYPT PREDVA~3 SURVIVEP SURVIVE3

. list, clean

At each discrete time point, there is a sample hazard probability and a fitted/estimated hazard probability. Each implies its own survival probability. These are simply those curves. The dotted lines are the model-implied or fitted survival functions.

At each discrete time point, there is a sample hazard probability and a fitted/estimated hazard probability. Each implies its own survival probability. These are simply those curves. The dotted lines are the model-implied or fitted survival functions.