Lexis diagrams and analysis of register data

Lexis diagrams and

analysis of register dataPractical use of Lexis diagrams in the analysis and

routine reporting from population registers

Bendix CarstensenSteno Diabetes Center& Department of Biostatistics, University of [email protected] www.biostat.ku.dk/~bxc

Statistics for Health Registers and Linked DatabasesMilton Keynes, UK, May 2009

1/ 39

OutlineHealth registers

Lexis diagrams

Tabulation of follow-up

Models and likelihood

Diabetes incidence

Diabetes mortality

Cox modelling

Advantage of parametric hazards

Summary

2/ 39

Health events

I Diagnoses (hospitals, clinics)

I Procedures (treatments, measurements)

I Purchases (prescriptions)

Linkage by person ID:⇒ (partial) health history of persons.

Health registers 3/ 39

Population registers

I All persons in the population included.

I Universal linkage of persons.I The Nordic countries (DK,FI,IS,NO,SE):

I Person ID for all citizensI Used for health care purposesI Used for all taxation and social records as well

Not only health histories, but also healthhistories linked to social, economic andeducations status is possible.

I Censuses replaced by register tabulations.


Mortality rates from registers

I Entry time (e.g. date of diagnosis).

I Exit time (e.g. date of death).

Implicit here is the current state:Alive with a diagnosis.


Incidence rates from registers

I Entry time (e.g. date of birth).

I Exit time (e.g. date of diagnosis).

Implicit here is the current state:Alive without a diagnosis.

Usually not compiled directly, but based on:

I Cases from a register

I Follow-up derived from censuses

Note: the follow-up is derived from a register of allat risk; in this case the entire population.


General health history

I Current stateI Entry point in stateI Exit point from stateI Next state

Multistate models:

Well

DM

Dead

λλ

µµWµµD


Wilhelm Lexis

Wilhelm Lexis(1837–1914)German statistician andeconomist.

� ��.�"��

��/�/��&�� /�/��0�� 1��

��"� �2��34��"� �2��

!� ��&�� ,�&�� 5"��&��

."�� &�� ,�&�� &��

�� &��

��2:�;�!�� /��6��/��7�23�� 0�<</��2��8/�<2:<=�� >,7�� >,�?� 0>@�>:�>:��/�8��'8�888

:��: A,�A@�:AAB�:@�CB

Lexis diagrams 8/ 39

Lexis diagram

I Shows the follow-upa person from entryto exit as a functionof date and age.

I In general: follow-upshown on twotimescales.

Lexis diagrams for a 1‰random sample of theDanish National DiabetesRegister.

1994 1998 2002 20060

20

40

60

80

100

Date

Age

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

1994 1998 2002 20060

20

40

60

80

100

Date

Age

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●


1990 1995 2000 2005 201045

50

55

60

65

Date

Age

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●


Tabulation by age, period and cohort

Two extra complications:

1. Population risk time must be split in triangles.Can be done from population figures.Available for entire nations in the humanmortality database.

2. Midpoints of age-intervals are no longeraverage age in the classes. The same for periodand cohort.Midpoints should be offset by 1

3 to give averageage at follow-up.

Both complications are treated in “Age-period-cohort models

in the Lexis diagram”, Statistics in Medicine, 2007. [1, 2, 3]


Construction of rates

I Rate = Events / Risk time = D/Y

I Mortality = red blobs / length of lines

I Incidence rates = green blobs / population

What subsets of the Lexis diagram should this bedone for?

This is essentially asking:

“What timescales are we interested in?”

Tabulation of follow-up 13/ 39

Data manipulations

I From: Individual records (entry,exit,status)I To: Tables of (Events,risk time) = (D, Y ) by

timescales(age, date, duration. . . )

Each individual contributes to many cells of thetable, so it is not a tabulation of the individuals, itis a tabulation of the follow-up:

I Split follow-up in small pieces; each one with(d, y) recorded.

I Tabulate (d, y) by timescales(and other covariates of interest, such as sexand date of birth).


Keeping track

Functions for splitting time; one record per personto one record per period of follow-up, while keepingtrack of time-scale, risk time and events:

SAS: Macro %Lexis available fromwww.biostat.ku.dk/~bxc/Lexis.

Stata: Functions stset and stsplit.

R: Functions Lexis, splitLexis andcutLexis; available in the Epi package.


www.biostat.ku.dk/~bxc/Lexis

Keeping track in practice

The Epi package has a Lexis machinery (designedby Martyn Plummer, IARC, Lyon).

I Keeps track of multiple states and mutipletime scales

I Provides tools for summarizing and tabulatingfollow-up

I Lexis diagrams shown here are made byplot.Lexis

I An overview of the Lexis machinery isincluded in the package as a .pdf-document.


Tabulation

Split records (i.e. (d, y)) are then tabulated by:

1. Fixed covariates(sex, genotype, date of birth, . . . )

2. Timescales(age, calendar time, duration, . . . )

Rates can now be computed by any of the variablesin the tabulation.

Analysis proceeds as if observations wereindependent Poisson observations.


Analysis of rates

Rate = Events/Risk time = D/Y

This is based on the log-likelihood for observation ofD events during Y risk time with a constant rate λ:

`(λ) = Dlog(λ)− λY

Apart from a term Dlog(Y ), this is thelog-likelihood for a Poisson observation D withmean µ = λY ; log(µ) = log(λ) + log(Y )

The empirical rate is the ML-estimator in theconstant rate model.

Models and likelihood 18/ 39

Likelihood for one person

The likelihood from several intervals from oneindividual is a product of conditional probabilities:

P {event at t4| alive at t0}= P {event at t4| alive at t4}×

P {survive (t3, t4)| alive at t3}×P {survive (t2, t3)| alive at t2}×P {survive (t1, t2)| alive at t1}×P {survive (t0, t1)| alive at t0}

This can computationally be treated as thelikelihood of 4 independent Poisson observations,(1, 0, 0, 0) with possibly different means.


Likelihood for varying rates

I If we assume rates are constant in eachinterval, the log-likelihood from one individualis a sum of Poisson terms.

I Each term refers to one interval of follow-up, sonot independent but the likelihood is a product.

I The purpose of splitting the follow-up is toallow the rates to vary within the follow-up ofeach person.

I Intervals should be so small that rates can beassumed constant within each.(5-year age intervals are usually not.)


Analysis of split records

I Splitting the records allows rates to vary acrossfollow-up.

I The split records are analysed as independentPoisson.

I Tabulation makes analysis more handy,technically; but is formally superfluous.

I NOTE: A separate parameter for eachtabulation interval is not necessary.

I Use interval midpoints as a continuouscovariate, model the effect by splines,fractional polynomials, . . . .


Register data analysis

I The nature of data (individual records of eventdates) allows arbitrarily fine split of follow-up.

I The amount of data provides technicalproblems that can be solved by tabulation

I Analysis should report smoothed versions ofrates, possibly on multiple time-scales


Tabulation of incident DM cases

I Follow-up split byage and date (1-yrclasses).

I Cases (green dots)by age, date, sexand date of birth.

I Population figureswith risk timeamong DM ptt.subtracted to giverisk time amongnon-DM population.

1990 1995 2000 2005 201045

50

55

60

65

DateA

ge

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

Diabetes incidence 23/ 39

Model for DM incidence rates

Model (for each sex):

λ(a, p) = f(a)× g(p), g(2004) = 1

a — current agep — current date (period)

f(a) and g(p) are modelled by natural splines(restricted cubic splines)

Reported in detail in [4]: The National Danish Diabetes

Register: Trends in incidence, prevalence and mortality.

Diabetologia, 2008.Diabetes incidence 24/ 39

0 20 40 60 80 100

0.1

0.2

0.5

1.0

2.0

5.0

10.0

1

1

Age

Inci

denc

e ra

te p

er 1

000

pyrs

1996 2000 2004 2008

1

1

Date of inclusion

0.1

0.2

0.5

1.0

2.0

5.0

10.0

●●

Rat

e ra

tio

Diabetes incidence 25/ 39

Tabulation of DM deaths

I Follow-up split byage and date (1-yrclasses).

I Cases (red dots)and risk time (graylines) by age, date,duration, sex anddate of birth. 1990 1995 2000 2005 2010

45

50

55

60

65

DateA

ge

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

Diabetes mortality 26/ 39

Models for DM mortality rates

Model using current age (a), date at diagnosis(p− d) and duration (d) (two timescales):

λ(a, p) = f(a)×g(p−d)×h(d), g(2004) = 1, h(0) = 1

Model using current age at diagnosis, date atdiagnosis and duration (one timescale):

λ(a, p) = f(a−d)×g(p−d)×h(d), g(2004) = 1, h(0) = 1

f , g and h are modelled by natural splines(restricted cubic splines).


DM mortality, two timescales

30 40 50 60 70 80 90

2

5

10

20

50

100

200

1

1

Age

Mor

talit

y ra

te p

er 1

000

pyrs

1

1

2000

Inclusion date

●●

0 4 8 12

1

1

Time since inclusion

0.2

0.5

1.0

2.0

5.0

10.0

20.0

●●

Rat

e ra

tio


DM mortality, two timescales

30 40 50 60 70 80 90

2

5

10

20

50

100

200

1

1

Age

Mor

talit

y ra

te p

er 1

000

pyrs

1

1

2000

Inclusion date

●●

0 4 8 12

1

1


0.2

0.5

1.0

2.0

5.0

10.0

20.0

●●

Rat

e ra

tio


DM mortality, one timescale

30 40 50 60 70 80 90

2

5

10

20

50

100

200

1

1

Age at inclusion

Mor

talit

y ra

te p

er 1

000

pyrs

1

1

2000

Inclusion date

●●

0 4 8 12

1

1


0.2

0.5

1.0

2.0

5.0

10.0

20.0

●●

Rat

e ra

tio


Why not a Cox model?

Need to choose an underlying time-scale:

I Age (i.e. current age)

I Duration (i.e. time since diagnosis)

The other time scale is accomodated in aCox-model by splitting the follow-up on this, andincluding it as a covariate.

So you can accomodate more than one timescale ina Cox-model, but the hazzle with time-splitting isthe same.

The Poisson approach is easier because rates aredirectly estimated using smoothers.

Cox modelling 31/ 39

Age at entry?

If duration is taken as timescale and age at entry(e = a− d) as covariate:

λ(a, d,x) = λo(d)exp(α(a− d) + xβ)

= λo(d)e−αdexp(αa+ xβ)

The effect of current age is taken to be linear onthe log-scale, i.e. exponential effect of age.

Which in may cases is not too fra from reality —but unless you are prepared to split data you cannotcheck the feasibility of the model.

Cox modelling 32/ 39

The real advantage

Well DM

Dead (no DM) Dead (DM)

-

? ?

λ(a)

µW (a) µD(a, d)

Advantage of parametric hazards 33/ 39

The relationships between the rates and the probabilities are:

P {Well at a} = exp(−∫ a

0

λ(s) + µW (s) ds)

P {Dead (well) at a} =∫ a

0

µW (s)exp(−∫ s

0

λ(u) + µW (u) du)

ds

P {DM at a} =∫ a

0

P {DM diagnosis at s}

×P {survive with DM from s to a} ds

=∫ a

0

λ(s)exp(−∫ s

0

λ(u) + µW (u) du)

×exp(−∫ a

s

µD(u, u− s) du)

ds

P {Dead (DM) at a} = 1− P {Well at a} − P {Dead (well) at a}−P {DM at a}


The real advantage

Poisson models gives parametic expressions for therates, so calculation of integrals is simple; they arejust sums:

# Evaluate the cumulative rates at the *end* of the intervalsInc <- cumsum( inc )M.w <- cumsum( m.w )# Probability of being in the "Well" state

P.w <- exp( -(Inc+M.w) )# Probability of being dead without disease, i.e in the "Dead well" state

P.mw <- cumsum( m.w * exp( -(Inc+M.w) ) )# Probability of being alive with disease

P.wd <- x <- numeric( A )for( a in 1:A ){ for( d in 1:a ) # here d plays the role of age at diagnosis

x[d] <- inc[d] * exp( -(Inc[d]+M.w[d]) ) *exp( -sum( m.d[cbind(d:a,1:(a-d+1))] ) )

P.wd[a] <- sum( x[1:a] ) }res <- cbind( P.w, P.wd, 1-P.w-P.mw-P.wd, P.mw )


0 20 40 60 80 1000.0

0.2

0.4

0.6

0.8

1.0

a.pt

rep(

2, N

)

a.pt

rep(

2, N

)

20 40 60 80 1000.0

0.2

0.4

0.6

0.8

1.0

P(+, well)

P(+, DM)

P(Alive, DM)

P(Alive, well)

Age


●

0 20 40 60 80 1000

5

10

15

20

25

30

Age

P(

DM

bef

ore

age

a )

(%)

0

5

10

15

20

25

30


Summary

I Registers provide follow-up for health events.

I For the initial event, population risk time isneeded to compute rates.

I Tables of events and risk time should be withnarrow time intervals.

I Effects of timescales modelled by using theinterval midpoints as quantitative variables.

I Show data in Lexis diagrams.

I Show rates as interpretable curves.

I Usually many timescales are available; informedchoice is needed.

Summary 38/ 39

References

B Carstensen.Age-Period-Cohort models for the Lexis diagram.Statistics in Medicine, 26(15):3018–3045, July 2007.

J Rosenbauer and K Strassburger.Comments on: ”Age-Period-Cohort models for the Lexis diagram”.Statistics in Medicine, 27:1557–1561, 2007.

B Carstensen.Age-Period-Cohort models for the Lexis diagram (author’s reply).Statistics in Medicine, 27:1561–1564, 2007.

B Carstensen, JK Kristensen, P Ottosen, and K Borch-Johnsen.The Danish National Diabetes Register: Trends in incidence, prevalence andmortality.Diabetologia, 51:2187–2196, 2008.

The presentation is available on my homepage:www.biostat.ku.dk/~bxc/

Summary 39/ 39

www.biostat.ku.dk/~bxc/

Lexis diagrams and analysis of register data

Documents

Transcript of Lexis diagrams and analysis of register data