Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational...

Predictive statistical modelling

approach to estimating TB prevalence Sandra Alba, Ente Rood, Masja Straetemans and Mirjam Bakker

Model inputs and outputs

Independent

variables

- Bacteriologically confirmed TB prevalence - Surveys conducted after 2007 (“Lime book” methodology) - Subnational TB prevalence estimates

Independent

variables

- TB data, programmatic factors, co-morbidities and socio-environmental predictors

- National level data: TME, WB, GHR, UNICEF, IDF - Subnational data: NTP, DHS, MICS, CBS and other

representative surveys. - Predictors only available nationally averaged out at

subnational level - Total used in univariate analyses: 37

Training set - 30 datapoints in total

Countries to predict

- 2013 estimates - 25 low and 49 middle income countries - without prevalence survey - expected prevalence <0.1% according to WHO estimates

Titel

2

Total 30 data points

13 National prevalence surveys • 2007 Philippines • 2007 Vietnam • 2008 Bangladesh • 2009 Myanmar • 2010 China • 2011 Pakistan • 2011 Cambodia • 2011 Ethiopia • 2011 Lao PDR • 2012 Gambia • 2012 Nigeria • 2012 Rwanda • 2012 Thailand

Waiting for Tanzania, Ghana, Malawi,

Sudan, Zambia and Indonesia

Titel

3

Subnational estimates from 5 countries • Vietnam (3 areas) • Myanmar (2) • China (3) • Pakistan (6) • Nigeria (6)

2 district level surveys in India • 2009 Jabalpur (Madhya Pradesh) • 2009 Bangalore Rural (Karnataka)

• 2007 Thiruvallur (Tamil Nadu)

dropped - methodology?

On the lookout for reports of surveys

conducted in Wardha, Agra (Jalma) and

Faridabad districts

Training set vs. predictions

Titel

4

Titel

5

0

500

1000

0

500

1000

0

500

1000

0

500

1000

0 1 2 3 4 5 6 0 1 2 3 4 5 6

0 1 2 3 4 5 6 0 1 2 3 4 5 6

2007: PHL 2007: VNM 2008: BGD 2009: IND

2009: MMR 2010: CHN 2011: ETH 2011: KHM

2011: LAO 2011: PAK 2012: GMB 2012: NGA

2012: RWA 2012: THA

95% CI Point estimate

Ra

te p

er

100

'000

Subnational area*

*Subnational area=0 refers to national estimate

Prevalence estimates in training set, by country

Numerators and denominators

Candidate models for this task included GLM models - numerator and denominator need to be specified explicitly. Prevalence surveys report - numerators (BC TB) - denominators (number of participants in survey) - estimated prevalence resulting from models

However 1. Ratio between these two not equate the final estimated

prevalence: - models take into account population weighing, clustering,

non-participation and missing values. 2. Subnational data: numerators and denominators sometimes not

available.

Titel

6

Adjusted numerators and denominators

Solution: adjusted number of BC and participants based on - prevalence estimates and confidence intervals - average between

- n1=(p*(1-p))/(((ul-p)/1.96)^2) - n2=(p*(1-p))/(((ll-p)/1.96)^2)

Very crude method, needs to be revised at later stage - adequately capture the asymmetrical nature of CI for a proportion - Arcsine tranformation? Note: - Adjusted numerators and denominators approximately half of

number of cases and participants in the survey - Consistent with a design effect = 2

Titel

7

Model fitting

Two types GLM considered - binomial (logistic link) - negative binomial (offset: log adjusted number of participants)

+ A random effect to account for clustering by country. Model building strategy: - Univariate models fitted against 37 predictor variables (complete data) - Fit assessed by AIC - Multivariate model: 10 cases/covariate to avoid overfitting = 3 predictors - Variables dropped by backward elimination (p<0.05) - Principal components analysis for variable reduction

Titel

8

Best fitting final model: • Binomial model (logistic link) • Without 3 subnational estimates in Nigeria with very large

confidence interval (North Central, North West and South South)

• lower AIC

Climatic score: • PCA score: average temperature, maximum temperature in

warmest month, average rainfall • higher values indicate warmer wetter countries • (tropical/subtropical countries) • First component explains 77% of variation

Titel

9

Final model

Titel

10

Final model

Final Multivariate model coefficients (binomial), logistic scale

Model predictors Coefficient Strength

(Intercept) -3.03588

Climate score 0.16039 160

New laboratory confirmed rate 0.00812 8

BCG coverage -0.03610 -36

Predicted vs. observed (training set)

Titel

11

0

.00

2.0

04

.00

6.0

08

Pre

dic

ted p

reva

lence

0 .002 .004 .006 .008Observed prevalence

Model fit

Cross validation k=2, x5 R-sq (mean) =0.76

Cross validation k=2, x1000 R-sq (median) =0.57

Titel

12

0.1

.2.3

.4.5

Den

sity

-5 0 5Deviance residual

0

.00

2.0

04

.00

6.0

08

p_h

at

-5 0 5Deviance residual

WHO estimates vs. model predictions

Titel

13

CAF

NER

SOM

ZAF

0

100

02

00

03

00

0

Mo

de

l pre

dic

tio

ns

0 200 400 600 800 1000WHO estimate

Outliers

Titel

14

0.1

.2.3

.4

Den

sity

-4 -2 0 2Scores for component 1

0

.00

5.0

1.0

15

.02

Den

sity

0 50 100 150 200new_labconfr

Climate score (β=0.160) New lab confirmed rate (β =0.008)

0

.02

.04

.06

.08

Den

sity

20 40 60 80 100bcg

BCG (β = -0.036)

SOM

CAF

SOM

CAF

NER

NER

“Bland and Altman” plot of agreement

Model predictions greater

than WHO estimates

• mean difference=55

cases per 100.000

(exc. 3 outliers)

• random scatter around

this difference (a part

from outliers)

Titel

15

NER

ZAF

SOM

CAF

-300

0-2

00

0-1

00

0

0

100

0

Diffe

ren

ce (

WH

O e

stim

ate

- m

od

el p

red

iction

s)

0 500 1000 1500 2000Mean (WHO estimate, model predictions)

BC in adults vs. all forms all ages

Titel

16

WHO estimates: All forms all ages Model predictions: BC in adults → model predictions "too high" Solution? WHO estimates of BC in adults? → keep model "free" from WHO assumptions Crude adjustment: correct BC in adults by factor of 0.83 → ratio from TME prevalence survey dataset

“Bland and Altman” plot of agreement -

adjusted estimates

Titel

17

CAF

NER

SOM

ZAF

-200

0-1

50

0-1

00

0-5

00

0

500

Diffe

ren

ce (

WH

O e

stim

ate

- m

od

el p

red

iction

s)

0 500 1000 1500Mean (WHO estimate, model predictions)

Model predictions greater

than WHO estimates

• mean difference=3

cases per 100.000

(exc. 3 outliers)

Limitations of this

correction:

• too crude, blanket

correction for all

estimates after model

prediction

• better to compare with

WHO BC in adults

estimates

Discussion

Prevalence model successfully fitted • More datapoints with less precision vs. fewer datapoints with

more precision → sensitivity analysis • Model predictions broadly in line with WHO estimates • Model estimates heavily reliant on climatic score. Useful? • CAR and Somalia → sensitivity analyses exc. climate score

Methodological improvements • More precise estimates of adjusted BC and participants numbers • Confidence intervals, propagation of error • How to factor in time (lags, repeat surveys) • Predictions for high vs. low prevalence estimates (overestimate

/underestimate low prevalences with logistic model?) • Include survey specific variables (coverage, participation rate) as

random effects to filter out nuissance variability induced by these factors

• Consider fitting two models (Asia and Africa)

Titel

18

Data wishlist

From WHO • BC adults estimates using WHO estimation methods • Estimates from more recent prevalence surveys • China disaggregated NTP data • Reports for all India district level surveys

Note: in addition we will also include the following: • Disaggregated data for climate, population density • New data recently compiled (large cities, prevalence of high risk

groups)

Titel

19

Questions?

Comments?

Suggestions for improvement?

Titel

20

Extra slide: adjustments to prevalence

estimates The Bangladesh survey only reported SS+, so estimated BC based on the ratio

between SS+ and BC from prevalence surveys conducted WPR and SEA region in

2007 (year of Bangaldesh survey). The surveys used for the calculation were thus:

China, Cambodia, Lao People's Democratic Republic, Myanmar, Philippines, Thailand

and Viet Nam. The ratio was 0.456, so the prevalence of BC was estimated as follows:

prev_bc_100k=prev_sp_100k/0.4565.

The report from the Jabalpur survey concluded that BC estimates from the survey

should be corrected by a factor 1.7 to account for no x-ray screening, which was

done.

The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the

paucity of datapoints for model 2 these were keep for modeling but their impact on

model fit was assessed after all modeling.

Titel

21

Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational...

Documents

Transcript of Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational...