Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational...

21
Predictive statistical modelling approach to estimating TB prevalence Sandra Alba, Ente Rood, Masja Straetemans and Mirjam Bakker

Transcript of Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational...

Page 1: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Predictive statistical modelling

approach to estimating TB prevalence Sandra Alba, Ente Rood, Masja Straetemans and Mirjam Bakker

Page 2: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Model inputs and outputs

Independent

variables

- Bacteriologically confirmed TB prevalence - Surveys conducted after 2007 (“Lime book” methodology) - Subnational TB prevalence estimates

Independent

variables

- TB data, programmatic factors, co-morbidities and socio-environmental predictors

- National level data: TME, WB, GHR, UNICEF, IDF - Subnational data: NTP, DHS, MICS, CBS and other

representative surveys. - Predictors only available nationally averaged out at

subnational level - Total used in univariate analyses: 37

Training set - 30 datapoints in total

Countries to predict

- 2013 estimates - 25 low and 49 middle income countries - without prevalence survey - expected prevalence <0.1% according to WHO estimates

Titel

2

Page 3: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Total 30 data points

13 National prevalence surveys • 2007 Philippines • 2007 Vietnam • 2008 Bangladesh • 2009 Myanmar • 2010 China • 2011 Pakistan • 2011 Cambodia • 2011 Ethiopia • 2011 Lao PDR • 2012 Gambia • 2012 Nigeria • 2012 Rwanda • 2012 Thailand

Waiting for Tanzania, Ghana, Malawi,

Sudan, Zambia and Indonesia

Titel

3

Subnational estimates from 5 countries • Vietnam (3 areas) • Myanmar (2) • China (3) • Pakistan (6) • Nigeria (6)

2 district level surveys in India • 2009 Jabalpur (Madhya Pradesh) • 2009 Bangalore Rural (Karnataka)

• 2007 Thiruvallur (Tamil Nadu)

dropped - methodology?

On the lookout for reports of surveys

conducted in Wardha, Agra (Jalma) and

Faridabad districts

Page 4: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Training set vs. predictions

Titel

4

Page 5: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Titel

5

0

500

1000

0

500

1000

0

500

1000

0

500

1000

0 1 2 3 4 5 6 0 1 2 3 4 5 6

0 1 2 3 4 5 6 0 1 2 3 4 5 6

2007: PHL 2007: VNM 2008: BGD 2009: IND

2009: MMR 2010: CHN 2011: ETH 2011: KHM

2011: LAO 2011: PAK 2012: GMB 2012: NGA

2012: RWA 2012: THA

95% CI Point estimate

Ra

te p

er

100

'000

Subnational area*

*Subnational area=0 refers to national estimate

Prevalence estimates in training set, by country

Page 6: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Numerators and denominators

Candidate models for this task included GLM models - numerator and denominator need to be specified explicitly. Prevalence surveys report - numerators (BC TB) - denominators (number of participants in survey) - estimated prevalence resulting from models

However 1. Ratio between these two not equate the final estimated

prevalence: - models take into account population weighing, clustering,

non-participation and missing values. 2. Subnational data: numerators and denominators sometimes not

available.

Titel

6

Page 7: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Adjusted numerators and denominators

Solution: adjusted number of BC and participants based on - prevalence estimates and confidence intervals - average between

- n1=(p*(1-p))/(((ul-p)/1.96)^2) - n2=(p*(1-p))/(((ll-p)/1.96)^2)

Very crude method, needs to be revised at later stage - adequately capture the asymmetrical nature of CI for a proportion - Arcsine tranformation? Note: - Adjusted numerators and denominators approximately half of

number of cases and participants in the survey - Consistent with a design effect = 2

Titel

7

Page 8: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Model fitting

Two types GLM considered - binomial (logistic link) - negative binomial (offset: log adjusted number of participants)

+ A random effect to account for clustering by country. Model building strategy: - Univariate models fitted against 37 predictor variables (complete data) - Fit assessed by AIC - Multivariate model: 10 cases/covariate to avoid overfitting = 3 predictors - Variables dropped by backward elimination (p<0.05) - Principal components analysis for variable reduction

Titel

8

Page 9: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Best fitting final model: • Binomial model (logistic link) • Without 3 subnational estimates in Nigeria with very large

confidence interval (North Central, North West and South South)

• lower AIC

Climatic score: • PCA score: average temperature, maximum temperature in

warmest month, average rainfall • higher values indicate warmer wetter countries • (tropical/subtropical countries) • First component explains 77% of variation

Titel

9

Final model

Page 10: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Titel

10

Final model

Final Multivariate model coefficients (binomial), logistic scale

Model predictors Coefficient Strength

(Intercept) -3.03588

Climate score 0.16039 160

New laboratory confirmed rate 0.00812 8

BCG coverage -0.03610 -36

Page 11: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Predicted vs. observed (training set)

Titel

11

0

.00

2.0

04

.00

6.0

08

Pre

dic

ted p

reva

lence

0 .002 .004 .006 .008Observed prevalence

Page 12: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Model fit

Cross validation k=2, x5 R-sq (mean) =0.76

Cross validation k=2, x1000 R-sq (median) =0.57

Titel

12

0.1

.2.3

.4.5

Den

sity

-5 0 5Deviance residual

0

.00

2.0

04

.00

6.0

08

p_h

at

-5 0 5Deviance residual

Page 13: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

WHO estimates vs. model predictions

Titel

13

CAF

NER

SOM

ZAF

0

100

02

00

03

00

0

Mo

de

l pre

dic

tio

ns

0 200 400 600 800 1000WHO estimate

Page 14: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Outliers

Titel

14

0.1

.2.3

.4

Den

sity

-4 -2 0 2Scores for component 1

0

.00

5.0

1.0

15

.02

Den

sity

0 50 100 150 200new_labconfr

Climate score (β=0.160) New lab confirmed rate (β =0.008)

0

.02

.04

.06

.08

Den

sity

20 40 60 80 100bcg

BCG (β = -0.036)

SOM

CAF

SOM

CAF

NER

NER

Page 15: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

“Bland and Altman” plot of agreement

Model predictions greater

than WHO estimates

• mean difference=55

cases per 100.000

(exc. 3 outliers)

• random scatter around

this difference (a part

from outliers)

Titel

15

NER

ZAF

SOM

CAF

-300

0-2

00

0-1

00

0

0

100

0

Diffe

ren

ce (

WH

O e

stim

ate

- m

od

el p

red

iction

s)

0 500 1000 1500 2000Mean (WHO estimate, model predictions)

Page 16: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

BC in adults vs. all forms all ages

Titel

16

WHO estimates: All forms all ages Model predictions: BC in adults → model predictions "too high" Solution? WHO estimates of BC in adults? → keep model "free" from WHO assumptions Crude adjustment: correct BC in adults by factor of 0.83 → ratio from TME prevalence survey dataset

Page 17: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

“Bland and Altman” plot of agreement -

adjusted estimates

Titel

17

CAF

NER

SOM

ZAF

-200

0-1

50

0-1

00

0-5

00

0

500

Diffe

ren

ce (

WH

O e

stim

ate

- m

od

el p

red

iction

s)

0 500 1000 1500Mean (WHO estimate, model predictions)

Model predictions greater

than WHO estimates

• mean difference=3

cases per 100.000

(exc. 3 outliers)

Limitations of this

correction:

• too crude, blanket

correction for all

estimates after model

prediction

• better to compare with

WHO BC in adults

estimates

Page 18: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Discussion

Prevalence model successfully fitted • More datapoints with less precision vs. fewer datapoints with

more precision → sensitivity analysis • Model predictions broadly in line with WHO estimates • Model estimates heavily reliant on climatic score. Useful? • CAR and Somalia → sensitivity analyses exc. climate score

Methodological improvements • More precise estimates of adjusted BC and participants numbers • Confidence intervals, propagation of error • How to factor in time (lags, repeat surveys) • Predictions for high vs. low prevalence estimates (overestimate

/underestimate low prevalences with logistic model?) • Include survey specific variables (coverage, participation rate) as

random effects to filter out nuissance variability induced by these factors

• Consider fitting two models (Asia and Africa)

Titel

18

Page 19: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Data wishlist

From WHO • BC adults estimates using WHO estimation methods • Estimates from more recent prevalence surveys • China disaggregated NTP data • Reports for all India district level surveys

Note: in addition we will also include the following: • Disaggregated data for climate, population density • New data recently compiled (large cities, prevalence of high risk

groups)

Titel

19

Page 20: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Questions?

Comments?

Suggestions for improvement?

Titel

20

Page 21: Predictive statistical modelling approach to …...The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the paucity of datapoints for model 2 these were

Extra slide: adjustments to prevalence

estimates The Bangladesh survey only reported SS+, so estimated BC based on the ratio

between SS+ and BC from prevalence surveys conducted WPR and SEA region in

2007 (year of Bangaldesh survey). The surveys used for the calculation were thus:

China, Cambodia, Lao People's Democratic Republic, Myanmar, Philippines, Thailand

and Viet Nam. The ratio was 0.456, so the prevalence of BC was estimated as follows:

prev_bc_100k=prev_sp_100k/0.4565.

The report from the Jabalpur survey concluded that BC estimates from the survey

should be corrected by a factor 1.7 to account for no x-ray screening, which was

done.

The confidence intervals of 3 Nigeria subnational estimates were very wide. Given the

paucity of datapoints for model 2 these were keep for modeling but their impact on

model fit was assessed after all modeling.

Titel

21