Post on 16-Dec-2015
Nonparametric maximum likelihood estimation (MLE)
for bivariate censored data
Marloes H. Maathuis
advisors:
Piet Groeneboom and Jon A. Wellner
Motivation
Estimate the distribution function of the
incubation period of HIV/AIDS:– Nonparametrically– Based on censored data:
• Time of HIV infection is interval censored
• Time of onset of AIDS is interval censored
or right censored
Approach
• Use MLE to estimate the bivariate distribution
• Integrate over diagonal strips: P(Y-X ≤ z) X (HIV)
Y (AIDS)
z
Main focus of the project
• MLE for bivariate censored data:– Computational aspects– (In)consistency and methods to repair the
inconsistency
Main focus of the project
• MLE for bivariate censored data:– Computational aspects– (In)consistency and methods to repair the
inconsistency
1980
1992
1996
1980 1983 1986 X (HIV)
Y (AIDS)In
terv
al o
f on
set
of A
IDS
Interval ofHIV infection
1980
1992
1996
1980 1983 1986 X (HIV)
Y (AIDS)In
terv
al o
f on
set
of A
IDS
Interval ofHIV infection
Observation rectangle Ri
α1 α2
α3 α4
X (HIV)
Y (AIDS)Maximal intersections
Observation rectangle Ri
1
max log ,n
F F i i ii
P X Y R
F
s.t. and
4 1 1 3max log( ) log( )
1 2 2 4log( ) log( )
3 4log( )
0, 1, , 4,i i 4
1
1ii
3/5 0
0 25
X (HIV)
Y (AIDS)Maximal intersections
Observation rectangle Ri
The αi’s are not always uniquely determined: mixture non uniqueness
1
max log ,n
F F i i ii
P X Y R
F
s.t. and
4 1 1 3max log( ) log( )
1 2 2 4log( ) log( )
3 4log( )
0, 1, , 4,i i 4
1
1ii
Computation of the MLE
• Reduction step:
determine the maximal intersections
• Optimization step:
determine the amounts of mass assigned to the maximal intersections
Computation of the MLE
• Reduction step:
determine the maximal intersections
• Optimization step:
determine the amounts of mass assigned to the maximal intersections
Existing reduction algorithms
• Betensky and Finkelstein (1999, Stat. in Medicine) • Gentleman and Vandal (2001, JCGS) • Song (2001, Ph.D. thesis) • Bogaerts and Lesaffre (2003, Tech. report)
The first three algorithms are very slow,
the last algorithm is of complexity O(n3).
New algorithms
• Tree algorithm
• Height map algorithm: – based on the idea of a height map of the
observation rectangles– very simple– very fast: O(n2)
1
11
1000
0
11
33
2110
0
21
33
2121
0
21
22
1011
0
10
11
0011
0
00
21
1122
1
00
11
0011
0
00
00
0011
0
0
Height map algorithm: O(n2)
1
22
2110
0
2
Main focus of the project
• MLE of bivariate censored data:– Computational aspects – (In)consistency and methods to repair the
inconsistency
Methods to repair inconsistency
• Transform the lines into strips
• MLE on a sieve of piecewise constant densities
• Kullback-Leibler approach
• cannot be estimated
consistently
X = time of HIV infectionY = time of onset of AIDSZ = Y-X = incubation period
( )P Z z
X = time of HIV infectionY = time of onset of AIDSZ = Y-X = incubation period
1 2( )P Z z x X x
• An example of a parameter we can estimate consis-tently is:
Conclusions (1)
• Our algorithms for the parameter reduction step are significantly faster than other existing algorithms.
• We proved that in general the naive MLE is an inconsistent estimator for our AIDS model.
Conclusions (2)
• We explored several methods to repair the inconsistency of the naive MLE.
• cannot be estimated consistently without additional assumptions. An alternative parameter that we can estimate consistently is:
. 1 2( )P Z z x X x
( )P Z z