Nonparametric maximum likelihood estimation (MLE) for bivariate censored data Marloes H. Maathuis...

42
Nonparametric maximum likelihood estimation (MLE) for bivariate censored data Marloes H. Maathuis advisors: Piet Groeneboom and Jon A. Wellner

Transcript of Nonparametric maximum likelihood estimation (MLE) for bivariate censored data Marloes H. Maathuis...

Nonparametric maximum likelihood estimation (MLE)

for bivariate censored data

Marloes H. Maathuis

advisors:

Piet Groeneboom and Jon A. Wellner

Motivation

Estimate the distribution function of the

incubation period of HIV/AIDS:– Nonparametrically– Based on censored data:

• Time of HIV infection is interval censored

• Time of onset of AIDS is interval censored

or right censored

Approach

• Use MLE to estimate the bivariate distribution

• Integrate over diagonal strips: P(Y-X ≤ z) X (HIV)

Y (AIDS)

z

Main focus of the project

• MLE for bivariate censored data:– Computational aspects– (In)consistency and methods to repair the

inconsistency

Main focus of the project

• MLE for bivariate censored data:– Computational aspects– (In)consistency and methods to repair the

inconsistency

1980

1992

1996

1980 1983 1986 X (HIV)

Y (AIDS)In

terv

al o

f on

set

of A

IDS

Interval ofHIV infection

1980

1992

1996

1980 1983 1986 X (HIV)

Y (AIDS)In

terv

al o

f on

set

of A

IDS

Interval ofHIV infection

Observation rectangle Ri

X (HIV)

Y (AIDS)

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

α1 α2

α3 α4

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

s.t. and

4 1 1 3max log( ) log( )

1 2 2 4log( ) log( )

3 4log( )

0, 1, , 4,i i 4

1

1ii

3/5 0

0 25

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

The αi’s are not always uniquely determined: mixture non uniqueness

1

max log ,n

F F i i ii

P X Y R

F

s.t. and

4 1 1 3max log( ) log( )

1 2 2 4log( ) log( )

3 4log( )

0, 1, , 4,i i 4

1

1ii

Computation of the MLE

• Reduction step:

determine the maximal intersections

• Optimization step:

determine the amounts of mass assigned to the maximal intersections

Computation of the MLE

• Reduction step:

determine the maximal intersections

• Optimization step:

determine the amounts of mass assigned to the maximal intersections

Existing reduction algorithms

• Betensky and Finkelstein (1999, Stat. in Medicine) • Gentleman and Vandal (2001, JCGS) • Song (2001, Ph.D. thesis) • Bogaerts and Lesaffre (2003, Tech. report)

The first three algorithms are very slow,

the last algorithm is of complexity O(n3).

New algorithms

• Tree algorithm

• Height map algorithm: – based on the idea of a height map of the

observation rectangles– very simple– very fast: O(n2)

1

11

1000

0

11

33

2110

0

21

33

2121

0

21

22

1011

0

10

11

0011

0

00

21

1122

1

00

11

0011

0

00

00

0011

0

0

Height map algorithm: O(n2)

1

22

2110

0

2

Main focus of the project

• MLE of bivariate censored data:– Computational aspects – (In)consistency and methods to repair the

inconsistency

HIV

AIDS

u1 u2

Time of HIV infection is interval censored case 2

HIV

AIDS

u1 u2

Time of HIV infection is interval censored case 2

HIV

AIDS

u1 u2

Time of HIV infection is interval censored case 2

HIV

AIDS

t = min(c,y)

u1 u2

Time of onset of AIDS is right censored

HIV

AIDS

t = min(c,y)

u1 u2

Time of onset of AIDS is right censored

HIV

AIDS

t = min(c,y)

u1 u2

Time of onset of AIDS is right censored

t = min(c,y)

HIV

AIDS

u1 u2

HIV

AIDS

u1 u2

t = min(c,y)

HIV

AIDS

u1 u2

t = min(c,y)

HIV

AIDS

u1 u2

t = min(c,y)

Inconsistency of the naive MLE

Inconsistency of the naive MLE

Inconsistency of the naive MLE

Inconsistency of the naive MLE

Methods to repair inconsistency

• Transform the lines into strips

• MLE on a sieve of piecewise constant densities

• Kullback-Leibler approach

• cannot be estimated

consistently

X = time of HIV infectionY = time of onset of AIDSZ = Y-X = incubation period

( )P Z z

X = time of HIV infectionY = time of onset of AIDSZ = Y-X = incubation period

1 2( )P Z z x X x

• An example of a parameter we can estimate consis-tently is:

Conclusions (1)

• Our algorithms for the parameter reduction step are significantly faster than other existing algorithms.

• We proved that in general the naive MLE is an inconsistent estimator for our AIDS model.

Conclusions (2)

• We explored several methods to repair the inconsistency of the naive MLE.

• cannot be estimated consistently without additional assumptions. An alternative parameter that we can estimate consistently is:

. 1 2( )P Z z x X x

( )P Z z

Acknowledgements

• Piet Groeneboom

• Jon Wellner