Estimating the distribution of the incubation period of HIV/AIDS Marloes H. Maathuis Joint work...

65
Estimating the distribution of the incubation period of HIV/AIDS Marloes H. Maathuis Joint work with: Piet Groeneboom and Jon A. Wellner

Transcript of Estimating the distribution of the incubation period of HIV/AIDS Marloes H. Maathuis Joint work...

Estimating the distribution of the incubation period of HIV/AIDS

Marloes H. Maathuis

Joint work with:

Piet Groeneboom and Jon A. Wellner

Incubation period

Time between HIV infection and onset of AIDS

1985

HIV

1996

AIDS

Incubation period11 years

19801980

HIV

AIDS

1985

1996

Censored data

1983 1986 1992 1996

Interval of HIV infection

Interval ofonset of AIDS

Lower bound of incubation period6 years

Upper bound of incubation period13 years

1980

1992

1996

1980 1983 1986 X (HIV)

Y (AIDS)In

terv

al o

f on

set

of A

IDS

Interval ofHIV infection

1980

1980 X (HIV)

Y (AIDS)

Distribution functions

• Goal: estimate the distribution function of the incubation period of HIV/AIDS

• Why? This is important for predicting the future course of the epidemic

• Strategy: First estimate the 2-dimensional distribution

Main focus

• Nonparametric maximum likelihood estimator (MLE) for 2-dimensional distribution:– Computational aspects– Theoretical properties (consistency)

Computation of the MLE

• Parameter reduction:

determine the inner rectangles

• Optimization:

determine the amounts of mass assigned to the inner rectangles.

n

iiF RYXPFL

1

),()(max

Inner rectangles

X (HIV)

Y (AIDS)

n

iiF RYXPFL

1

),()(max

Inner rectangles

X (HIV)

Y (AIDS)

n

iiF RYXPFL

1

),()(max

Inner rectangles

X (HIV)

Y (AIDS)

n

iiF RYXPFL

1

),()(max

Inner rectangles

X (HIV)

Y (AIDS)

n

iiF RYXPFL

1

),()(max

Inner rectangles

X (HIV)

Y (AIDS)

n

iiF RYXPFL

1

),()(max

Inner rectangles

The MLE is insensitive to the distribution of mass within the inner rectangles. This gives non-uniqueness.

X (HIV)

Y (AIDS)

α1 α2

α3 α4

n

iiF RYXP

1

),(logmax

X (HIV)

Y (AIDS)

α1 α2

α3 α4

n

iiF RYXP

1

),(logmax

4maxR

)log( 1

X (HIV)

Y (AIDS)

n

iiF RYXP

1

),(logmax

4maxR

)log( 1 )log( 31

α1 α2

α3 α4

X (HIV)

Y (AIDS)

n

iiF RYXP

1

),(logmax

4maxR

)log( 1 )log( 31 )log( 21

α1 α2

α3 α4

X (HIV)

Y (AIDS)

α1 α2

α3 α4

n

iiF RYXP

1

),(logmax

4maxR

)log( 1 )log( 31 )log( 21 )log( 42

X (HIV)

Y (AIDS)

n

iiF RYXP

1

),(logmax

4maxR

)log( 1 )log( 31 )log( 21 )log( 43

)log( 42

s.t. 4,...,1,0 ii 14

1

i

iand

α1 α2

α3 α4

X (HIV)

Y (AIDS)

n

iiF RYXP

1

),(logmax

4maxR

)log( 1 )log( 31 )log( 21 )log( 43

)log( 42

s.t. 4,...,1,0 ii 14

1

i

iand

3/5 0

0 25

The αi’s are not always uniquely determined: second type of non-uniqueness

X (HIV)

Y (AIDS)

Graph theory

R4

R1

R2R3

R5

R3 R4

R2 R5

R1

Intersection graph

The maximal cliques correspond to the inner rectangles

Maximal cliques: {R1,R2,R3}, {R3,R4}, {R4,R5}, {R2,R5}

Set of rectangles

Existing reduction algorithms

• Betensky and Finkelstein (1999)

• Gentleman and Vandal (2001,2002)

• Song (2001)

These algorithms are slow,

complexity O(n4) to O(n5)

New algorithms

• MaxCliqueFinder

complexity ≤ O(n2 log n)

• SimpleCliqueFinder

complexity O(n2)

160 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 98765432 0 1

4

6

8

2

5

7

9

3

0

R4

R1R2

R3

R5Segment tree

160 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 98765432 0 1

4

6

8

2

5

7

9

3

0

R4

R1R2

R3

R5Segment tree

160 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 98765432 0 1

4

6

8

2

5

7

9

3

0

R4

R1R2

R3

R5Segment tree

160 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 98765432 0 1

4

6

8

2

5

7

9

3

0

R4

R1R2

R3

R5Segment tree

{R5,R2}{R3,R1,R2}

Maximal cliques:

160 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 98765432 0 1

4

6

8

2

5

7

9

3

0

R4

R1

R1

R1

R1

R2

R2

R2

R2

R3

R3 R3

R3

R5

R5 R5

R5

1

11

1000

0

11

33

2110

0

21

33

2121

0

21

22

1011

0

10

11

0011

0

00

21

1122

1

00

11

0011

0

00

00

0011

0

0

SimpleCliqueFinder

1

22

2110

0

2

Computation of the MLE

• Parameter reduction:

determine the inner rectangles

• Optimization:

determine the amounts of mass assigned to the inner rectangles.

Optimization

• High-dimensional convex constrained optimization problem

Amsterdam Cohort Study among injecting drug users

• Open cohort study

• Data available from 1985 to 1997

• 637 individuals were enrolled

• 216 individuals tested positive for HIV during the study

Model

X: time of HIV infection

Y: time of onset of AIDS

Z = Y-X: incubation period

U1 ,U2: observation times for X

C: censoring variable for Y

(X, Y) and (U1 ,U2, C) are independent

HIV

AIDS

u1 u2

HIV

AIDS

u1 u2

HIV

AIDS

u1 u2

HIV

AIDS

t = min(c,y)

u1 u2

HIV

AIDS

t = min(c,y)

u1 u2

HIV

AIDS

t = min(c,y)

u1 u2

t = min(c,y)

HIV

AIDS

u1 u2

We observe:W = (U1, U2, T=min(C,Y), Δ)

HIV

AIDS

u1 u2

t = min(c,y)

We observe:W = (U1, U2, T=min(C,Y), Δ)

HIV

AIDS

u1 u2

t = min(c,y)

We observe:W = (U1, U2, T=min(C,Y), Δ)

HIV

AIDS

u1 u2

t = min(c,y)

We observe:W = (U1, U2, T=min(C,Y), Δ)

Inconsistency of the naive MLE

Inconsistency of the naive MLE

Inconsistency of the naive MLE

Inconsistency of the naive MLE

Methods to repair inconsistency

• Transform the lines into strips

• MLE on a sieve of piecewise constant densities

• Kullback-Leibler approach

1985

19801980

X (HIV)

Y (AIDS)

How to estimate P(Y-X ≤ z)?

• The distribution function of the

incubation period cannot be estimated consistently

P(Z ≤ z, Y ≤ 1997)

• What we can estimate consistently is

Conclusions (1)

• We found the graph theoretic framework very useful

• Our algorithms for the parameter reduction step are significantly faster than other methods.

• We proved that in general the naive MLE is an inconsistent estimator for our AIDS model.

Conclusions (2)

• We explored several methods to repair the inconsistency

• The MLE can be very sensitive to small changes in the data

• There is not enough information to estimate the incubation period consistently without making additional assumptions