Estimating the distribution of the incubation period of HIV/AIDS Marloes H. Maathuis Joint work...
-
Upload
aniyah-tidball -
Category
Documents
-
view
215 -
download
0
Transcript of Estimating the distribution of the incubation period of HIV/AIDS Marloes H. Maathuis Joint work...
Estimating the distribution of the incubation period of HIV/AIDS
Marloes H. Maathuis
Joint work with:
Piet Groeneboom and Jon A. Wellner
Incubation period
Time between HIV infection and onset of AIDS
1985
HIV
1996
AIDS
Incubation period11 years
Censored data
1983 1986 1992 1996
Interval of HIV infection
Interval ofonset of AIDS
Lower bound of incubation period6 years
Upper bound of incubation period13 years
1980
1992
1996
1980 1983 1986 X (HIV)
Y (AIDS)In
terv
al o
f on
set
of A
IDS
Interval ofHIV infection
Distribution functions
• Goal: estimate the distribution function of the incubation period of HIV/AIDS
• Why? This is important for predicting the future course of the epidemic
• Strategy: First estimate the 2-dimensional distribution
Main focus
• Nonparametric maximum likelihood estimator (MLE) for 2-dimensional distribution:– Computational aspects– Theoretical properties (consistency)
Computation of the MLE
• Parameter reduction:
determine the inner rectangles
• Optimization:
determine the amounts of mass assigned to the inner rectangles.
n
iiF RYXPFL
1
),()(max
Inner rectangles
The MLE is insensitive to the distribution of mass within the inner rectangles. This gives non-uniqueness.
X (HIV)
Y (AIDS)
n
iiF RYXP
1
),(logmax
4maxR
)log( 1 )log( 31 )log( 21 )log( 43
)log( 42
s.t. 4,...,1,0 ii 14
1
i
iand
α1 α2
α3 α4
X (HIV)
Y (AIDS)
n
iiF RYXP
1
),(logmax
4maxR
)log( 1 )log( 31 )log( 21 )log( 43
)log( 42
s.t. 4,...,1,0 ii 14
1
i
iand
3/5 0
0 25
The αi’s are not always uniquely determined: second type of non-uniqueness
X (HIV)
Y (AIDS)
Graph theory
R4
R1
R2R3
R5
R3 R4
R2 R5
R1
Intersection graph
The maximal cliques correspond to the inner rectangles
Maximal cliques: {R1,R2,R3}, {R3,R4}, {R4,R5}, {R2,R5}
Set of rectangles
Existing reduction algorithms
• Betensky and Finkelstein (1999)
• Gentleman and Vandal (2001,2002)
• Song (2001)
These algorithms are slow,
complexity O(n4) to O(n5)
160 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 98765432 0 1
4
6
8
2
5
7
9
3
0
R4
R1R2
R3
R5Segment tree
160 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 98765432 0 1
4
6
8
2
5
7
9
3
0
R4
R1R2
R3
R5Segment tree
160 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 98765432 0 1
4
6
8
2
5
7
9
3
0
R4
R1R2
R3
R5Segment tree
160 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 98765432 0 1
4
6
8
2
5
7
9
3
0
R4
R1R2
R3
R5Segment tree
{R5,R2}{R3,R1,R2}
Maximal cliques:
160 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 98765432 0 1
4
6
8
2
5
7
9
3
0
R4
R1
R1
R1
R1
R2
R2
R2
R2
R3
R3 R3
R3
R5
R5 R5
R5
1
11
1000
0
11
33
2110
0
21
33
2121
0
21
22
1011
0
10
11
0011
0
00
21
1122
1
00
11
0011
0
00
00
0011
0
0
SimpleCliqueFinder
1
22
2110
0
2
Computation of the MLE
• Parameter reduction:
determine the inner rectangles
• Optimization:
determine the amounts of mass assigned to the inner rectangles.
Amsterdam Cohort Study among injecting drug users
• Open cohort study
• Data available from 1985 to 1997
• 637 individuals were enrolled
• 216 individuals tested positive for HIV during the study
Model
X: time of HIV infection
Y: time of onset of AIDS
Z = Y-X: incubation period
U1 ,U2: observation times for X
C: censoring variable for Y
(X, Y) and (U1 ,U2, C) are independent
Methods to repair inconsistency
• Transform the lines into strips
• MLE on a sieve of piecewise constant densities
• Kullback-Leibler approach
• The distribution function of the
incubation period cannot be estimated consistently
P(Z ≤ z, Y ≤ 1997)
• What we can estimate consistently is
Conclusions (1)
• We found the graph theoretic framework very useful
• Our algorithms for the parameter reduction step are significantly faster than other methods.
• We proved that in general the naive MLE is an inconsistent estimator for our AIDS model.
Conclusions (2)
• We explored several methods to repair the inconsistency
• The MLE can be very sensitive to small changes in the data
• There is not enough information to estimate the incubation period consistently without making additional assumptions