Nonparametric maximum likelihood estimation (MLE) for bivariate censored data
description
Transcript of Nonparametric maximum likelihood estimation (MLE) for bivariate censored data
Nonparametric maximum likelihood estimation (MLE)
for bivariate censored data
Marloes H. Maathuis
advisors:
Piet Groeneboom and Jon A. Wellner
Motivation
Estimate the distribution function of the
incubation period of HIV/AIDS:– Nonparametrically– Based on censored data:
• Time of HIV infection is interval censored
• Time of onset of AIDS is interval censored
or right censored
Approach
• Use MLE to estimate the bivariate distribution
• Integrate over diagonal strips: P(Y-X ≤ z) X (HIV)
Y (AIDS)
z
Main focus of the project
• MLE for bivariate censored data:– Computational aspects– (In)consistency and methods to repair the
inconsistency
Main focus of the project
• MLE for bivariate censored data:– Computational aspects– (In)consistency and methods to repair the
inconsistency
1980
1992
1996
1980 1983 1986 X (HIV)
Y (AIDS)In
terv
al o
f on
set
of A
IDS
Interval ofHIV infection
1980
1992
1996
1980 1983 1986 X (HIV)
Y (AIDS)In
terv
al o
f on
set
of A
IDS
Interval ofHIV infection
Observation rectangle Ri
X (HIV)
Y (AIDS)
Observation rectangle Ri
1
max log ,n
F F i i ii
P X Y R
F
X (HIV)
Y (AIDS)Maximal intersections
Observation rectangle Ri
1
max log ,n
F F i i ii
P X Y R
F
X (HIV)
Y (AIDS)Maximal intersections
Observation rectangle Ri
1
max log ,n
F F i i ii
P X Y R
F
X (HIV)
Y (AIDS)Maximal intersections
Observation rectangle Ri
1
max log ,n
F F i i ii
P X Y R
F
X (HIV)
Y (AIDS)Maximal intersections
Observation rectangle Ri
1
max log ,n
F F i i ii
P X Y R
F
X (HIV)
Y (AIDS)Maximal intersections
Observation rectangle Ri
1
max log ,n
F F i i ii
P X Y R
F
α1 α2
α3 α4
X (HIV)
Y (AIDS)Maximal intersections
Observation rectangle Ri
1
max log ,n
F F i i ii
P X Y R
F
s.t. and
4 1 1 3max log( ) log( )
1 2 2 4log( ) log( )
3 4log( )
0, 1, , 4,i i 4
1
1ii
3/5 0
0 25
X (HIV)
Y (AIDS)Maximal intersections
Observation rectangle Ri
The αi’s are not always uniquely determined: mixture non uniqueness
1
max log ,n
F F i i ii
P X Y R
F
s.t. and
4 1 1 3max log( ) log( )
1 2 2 4log( ) log( )
3 4log( )
0, 1, , 4,i i 4
1
1ii
Computation of the MLE
• Reduction step:
determine the maximal intersections
• Optimization step:
determine the amounts of mass assigned to the maximal intersections
Computation of the MLE
• Reduction step:
determine the maximal intersections
• Optimization step:
determine the amounts of mass assigned to the maximal intersections
Existing reduction algorithms
• Betensky and Finkelstein (1999, Stat. in Medicine) • Gentleman and Vandal (2001, JCGS) • Song (2001, Ph.D. thesis) • Bogaerts and Lesaffre (2003, Tech. report)
The first three algorithms are very slow,
the last algorithm is of complexity O(n3).
New algorithms
• Tree algorithm
• Height map algorithm: – based on the idea of a height map of the
observation rectangles– very simple– very fast: O(n2)
1
11
1000
0
11
33
2110
0
21
33
2121
0
21
22
1011
0
10
11
0011
0
00
21
1122
1
00
11
0011
0
00
00
0011
0
0
Height map algorithm: O(n2)
1
22
2110
0
2
Main focus of the project
• MLE of bivariate censored data:– Computational aspects – (In)consistency and methods to repair the
inconsistency
HIV
AIDS
u1 u2
Time of HIV infection is interval censored case 2
HIV
AIDS
u1 u2
Time of HIV infection is interval censored case 2
HIV
AIDS
u1 u2
Time of HIV infection is interval censored case 2
HIV
AIDS
t = min(c,y)
u1 u2
Time of onset of AIDS is right censored
HIV
AIDS
t = min(c,y)
u1 u2
Time of onset of AIDS is right censored
HIV
AIDS
t = min(c,y)
u1 u2
Time of onset of AIDS is right censored
t = min(c,y)
HIV
AIDS
u1 u2
HIV
AIDS
u1 u2
t = min(c,y)
HIV
AIDS
u1 u2
t = min(c,y)
HIV
AIDS
u1 u2
t = min(c,y)
Inconsistency of the naive MLE
Inconsistency of the naive MLE
Inconsistency of the naive MLE
Inconsistency of the naive MLE
Methods to repair inconsistency
• Transform the lines into strips
• MLE on a sieve of piecewise constant densities
• Kullback-Leibler approach
• cannot be estimated
consistently
X = time of HIV infectionY = time of onset of AIDSZ = Y-X = incubation period
( )P Z z
X = time of HIV infectionY = time of onset of AIDSZ = Y-X = incubation period
1 2( )P Z z x X x
• An example of a parameter we can estimate consis-tently is:
Conclusions (1)
• Our algorithms for the parameter reduction step are significantly faster than other existing algorithms.
• We proved that in general the naive MLE is an inconsistent estimator for our AIDS model.
Conclusions (2)
• We explored several methods to repair the inconsistency of the naive MLE.
• cannot be estimated consistently without additional assumptions. An alternative parameter that we can estimate consistently is:
. 1 2( )P Z z x X x
( )P Z z
Acknowledgements
• Piet Groeneboom
• Jon Wellner