Assessing the impact of a health intervention via user-generated Internet content
-
Upload
vasileios-lampos -
Category
Science
-
view
1.743 -
download
2
Transcript of Assessing the impact of a health intervention via user-generated Internet content
ECML PKDD 2015, Porto, Portugal
Assessing the impact of a health intervention via user-‐generated Internet data
Data Mining and Knowledge Discovery 29(5), pp. 1434–1457, 2015
Vasileios Lampos, Elad Yom-‐Tov, Richard Pebody and Ingemar J. Cox
STATUTORY NOTIFICATIONS OF INFECTIOUS DISEASES
WEEK 2015/33 week ending 16/08/2015
in ENGLAND and WALES
Table 1 Statutory notifications of infectious diseases in the past 6 weeks with totals for the current year compared with corresponding periods of the two preceding years
CONTENTS
Table 2 Statutory notifications of infectious diseases for diseases for WEEK 2015/33 by PHE Region, county, local and unitary authority including additional diseases notifiable from 6th April 2010
Registered Medical Practioner in England and Wales have a statutory duty to notify a Proper Officer of the local authority, often the CCDC (Consultant in Communicable Disease Control), of suspected cases of certain infectious diseases:
Cholera Meningococcal septicaemia Viral haemorrhagic feverBrucellosis * Measles Typhus
Diphtheria Mumps Whooping cough
Food poisoning Rabies
Enteric fever (typhoid or paratyphoid) Plague Yellow fever
Botulism * Malaria Tuberculosis
Acute infectious hepatitis Infectious bloody diarrhoea * SARS *Acute encephalitis Haemolytic uraemic syndrome * Rubella
Acute meningitis Invasive group A Streptococcal disease
Scarlet fever
Anthrax Leprosy TetanusAcute poliomyelitis Legionnaires disease * Smallpox
* Notifiable from 6th April 2010
Notifications of infectious diseases, some of which are later microbiologically confirmed, prompt local investigation and action to control the diseases. Proper officers are required every week to inform the PHE (formerly the Registrar General) anonymised details of each case of each disease that has been notified. PHE has responsibilty of collating the weekly returns from proper officers and publishing analyses of local and national trends.
All weekly data are Provisional© Public Health England - Information Management Department
NOIDs WEEKLY REPORTTurning ideas into reality at
Microsoft Research Cambridge
๏ Background and motivation
๏ Nowcasting disease rates from online text
๏ Estimating the impact of a health intervention
๏ Case study: influenza vaccination impact
๏ Conclusions & future work
1%
Assessing the impact of a health intervention via online content
Online, user-‐generated data
+ Social media, blogs, search engine query logs
+ Proxy of real-‐world (online+offline) behaviour
+ Complementary information sensors to more ‘traditional’ crowdsourcing efforts
+ Can answer questions difficult to resolve otherwise
+ Strong predictive power
Online, user-‐generated data — Applications
+ Politics • voting intention • result of an election
+ Finance • financial indices • tourism patterns
+ User profiling • age • gender • occupation (Preotiuc-‐Pietro, Lampos & Aletras, 2015)
(Burger et al., 2011)
(Rao et al., 2010)
(Bollen, Mao & Zeng, 2011)
(Choi & Varian, 2012)
(Lampos, Preotiuc-‐Pietro & Cohn, 2013)
(Tumasjan et al., 2010)
Online, user-‐generated data for health
Traditional disease surveillance - does not cover the entire population - not present everywhere (cities / countries) - not always timely
Digital disease surveillance + different or better population coverage + better geographical granularity + useful in underdeveloped parts of the world + almost instant - noisy, unstructured information
e.g. (Lampos & Cristianini, 2010 & 2012), (Lamb, Paul & Dredze, 2013), (Lampos et al., 2015)
What this work is all about
Health intervention
disease rates
(Pebody & Cox, 2015
impact ?
What this work is all about
Health intervention
disease rates
(Lampos, Yom-‐Tov, Pebody & Cox, 2015)
impact ?
✓ Background and motivation
๏ Estimating disease rates from online text
๏ Estimating the impact of a health intervention
๏ Case study: influenza vaccination impact
๏ Conclusions & future work
Assessing the impact of a health intervention via online content
15%
Estimating disease rates from online text
Ridge regression
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+
MX
j=1
w2j
1
A
Elastic net
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+ �1
MX
j=1
|wj |+ �2
MX
j=1
w2j
1
A
Variables
NMX 2 RN⇥M
y 2 RN
1
time intervalsn-‐grams
frequency of n-‐grams during the time intervals
disease rates during the time intervals
Ridge regression
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+
MX
j=1
w2j
1
A
Elastic net
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+ �1
MX
j=1
|wj |+ �2
MX
j=1
w2j
1
A
Variables
NMX 2 RN⇥M
y 2 RN
1
(Hoerl & Kennard, 1970)
Ridge regressionRidge regression
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+
MX
j=1
w2j
1
A
Elastic net
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+ �1
MX
j=1
|wj |+ �2
MX
j=1
w2j
1
A
Variables
NMX 2 RN⇥M
y 2 RN
1
(Zou & Hastie, 2005)
Elastic net
Estimating disease rates from online text
Ridge regression
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+
MX
j=1
w2j
1
A
Elastic net
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+ �1
MX
j=1
|wj |+ �2
MX
j=1
w2j
1
A
Variables
NMX 2 RN⇥M
y 2 RN
GPs can be defined as sets of random variables, any finite number of
which have a multivariate Gaussian distribution.
In GP regression, for the inputs x, x
0 2 RM(both expressing rows of
the observation matrix X) we want to learn a function f : RM ! R that is
drawn from a GP prior
f(x) ⇠ GP �µ(x) = 0, k(x,x0
)
�
kSE(x,x0) = �2
exp
✓�kx� x
0k222`2
◆
where �2describes the overall level of variance and ` is referred to as the
characteristic length-scale parameter.
An infinite sum of SE kernels with di↵erent length-scales results to an-
other well studied covariance function, the Rational Quadratic (RQ) kernel.
kRQ(x,x0) = �2
✓1 +
kx� x
0k222↵`2
◆�↵
↵ is a parameter that determines the relative weighting between small
and large-scale variations of input pairs. The RQ kernel can be used to model
functions that are expected to vary smoothly across many length-scales.
1
Gaussian Process
Ridge regression
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+
MX
j=1
w2j
1
A
Elastic net
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+ �1
MX
j=1
|wj |+ �2
MX
j=1
w2j
1
A
Variables
NMX 2 RN⇥M
y 2 RN
GPs can be defined as sets of random variables, any finite number of
which have a multivariate Gaussian distribution.
In GP regression, for the inputs x, x
0 2 RM(both expressing rows of
the observation matrix X) we want to learn a function f : RM ! R that is
drawn from a GP prior
f(x) ⇠ GP �µ(x) = 0, k(x,x0
)
�
kSE(x,x0) = �2
exp
✓�kx� x
0k222`2
◆
where �2describes the overall level of variance and ` is referred to as the
characteristic length-scale parameter.
An infinite sum of SE kernels with di↵erent length-scales results to an-
other well studied covariance function, the Rational Quadratic (RQ) kernel.
kRQ(x,x0) = �2
✓1 +
kx� x
0k222↵`2
◆�↵
↵ is a parameter that determines the relative weighting between small
and large-scale variations of input pairs. The RQ kernel can be used to model
functions that are expected to vary smoothly across many length-scales.
1
Rational Quadratic covariance function (kernel)
infinite sum of squared exponential (RBF) kernels
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
One kernel per n-‐gram category varied usage patterns, increasing semantic value
(Rasmussen & Williams, 2006)
see also (
Estimating disease rates from online text
Ridge regression
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+
MX
j=1
w2j
1
A
Elastic net
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+ �1
MX
j=1
|wj |+ �2
MX
j=1
w2j
1
A
Variables
NMX 2 RN⇥M
y 2 RN
GPs can be defined as sets of random variables, any finite number of
which have a multivariate Gaussian distribution.
In GP regression, for the inputs x, x
0 2 RM(both expressing rows of
the observation matrix X) we want to learn a function f : RM ! R that is
drawn from a GP prior
f(x) ⇠ GP �µ(x) = 0, k(x,x0
)
�
kSE(x,x0) = �2
exp
✓�kx� x
0k222`2
◆
where �2describes the overall level of variance and ` is referred to as the
characteristic length-scale parameter.
An infinite sum of SE kernels with di↵erent length-scales results to an-
other well studied covariance function, the Rational Quadratic (RQ) kernel.
kRQ(x,x0) = �2
✓1 +
kx� x
0k222↵`2
◆�↵
↵ is a parameter that determines the relative weighting between small
and large-scale variations of input pairs. The RQ kernel can be used to model
functions that are expected to vary smoothly across many length-scales.
1
Gaussian Process
Ridge regression
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+
MX
j=1
w2j
1
A
Elastic net
argmin
w,�
0
@NX
i=1
(xiw + � � yi)2+ �1
MX
j=1
|wj |+ �2
MX
j=1
w2j
1
A
Variables
NMX 2 RN⇥M
y 2 RN
GPs can be defined as sets of random variables, any finite number of
which have a multivariate Gaussian distribution.
In GP regression, for the inputs x, x
0 2 RM(both expressing rows of
the observation matrix X) we want to learn a function f : RM ! R that is
drawn from a GP prior
f(x) ⇠ GP �µ(x) = 0, k(x,x0
)
�
kSE(x,x0) = �2
exp
✓�kx� x
0k222`2
◆
where �2describes the overall level of variance and ` is referred to as the
characteristic length-scale parameter.
An infinite sum of SE kernels with di↵erent length-scales results to an-
other well studied covariance function, the Rational Quadratic (RQ) kernel.
kRQ(x,x0) = �2
✓1 +
kx� x
0k222↵`2
◆�↵
↵ is a parameter that determines the relative weighting between small
and large-scale variations of input pairs. The RQ kernel can be used to model
functions that are expected to vary smoothly across many length-scales.
1
Rational Quadratic covariance function (kernel)
infinite sum of squared exponential (RBF) kernels
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].
2
One kernel per n-‐gram category varied usage patterns, increasing semantic value
(Rasmussen & Williams, 2006)
see also (Lampos et al., 2015)
Estimating influenza-‐like illness (ILI) rates — Data
2012 2013 20140
0.01
0.02
0.03
0.04
ILI r
ate
per 1
00 p
eopl
e
ILI rates (PHE)
Bing
User-‐generated data, geolocated in England • Twitter: May 2011 to April 2014 (308 million tweets) • Bing: end of December 2012 to April 2014
ILI rates from Public Health England (PHE)
Estimating ILI rates — Feature extraction
• Start with a manually crafted list of 36 textual markers, e.g. flu, headache, doctor, cough
• Extract frequent co-‐occurring n-‐grams from a corpus of 30 million UK tweets (February & March, 2014) after removing stop-‐words
• Set of markers expanded to 205 n-‐grams (n ≤ 4) e.g. #flu, #cough, annoying cough, worst sore throat
• Relatively small set of features motivated by previous work (Culotta, 2013)
Estimating ILI rates — Experimental setup
Two time intervals based on the different temporal coverage of Twitter and Bing data • Dt1: 154 weeks (May 2011 to April 2014) • Dt2: 67 weeks (December 2012 to April 2014)
Stratified 10-‐fold cross validation
Error metrics • Pearson correlation (r) • Mean Absolute Error (MAE)
Pearso
n co
rrelation (r)
0.5
0.6
0.7
0.8
0.9
1
User-‐generated data source
Twitter (Dt1) Twitter (Dt2) Bing (Dt2)
0.9520.924
0.8450.867
0.7440.718
0.814
0.698
0.64
Ridge Regression Elastic Net Gaussian Process
Estimating ILI rates — Performance
MAE
1
1.64
2.28
2.92
3.56
4.2
User-‐generated data source
Twitter (Dt1) Twitter (Dt2) Bing (Dt2)
1.598
1.9992.196
2.564
3.198
2.828 2.963
4.084
3.074
Ridge Regression Elastic Net Gaussian Process
Estimating ILI rates — Performancex 10
3
✓ Background and motivation
✓ Estimating disease rates from online text
๏ Estimating the impact of a health intervention
๏ Case study: influenza vaccination impact
๏ Conclusions & future work
Assessing the impact of a health intervention via online content
41%
Estimating the impact of a health intervention
1. Disease intervention launched (to a set of areas)
2. Define a distinct set of control areas
3. Estimate disease rates in all areas
4. Identify pairs of areas with strong historical correlation in their disease rates
5. Use this relationship during and slightly after the intervention to infer diseases rates in the affected areas had the intervention not taken place
Estimating the impact of a health intervention
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
qv
�v = qv � q
⇤v
✓v =
qv � q
⇤v
q
⇤v
.
2
time interval(s) before the interventionlocation(s) where the intervention took placecontrol location(s)
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
qv
�v = qv � q
⇤v
✓v =
qv � q
⇤v
q
⇤v
.
2
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
qv
�v = qv � q
⇤v
✓v =
qv � q
⇤v
q
⇤v
.
2
such that
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
qv
�v = qv � q
⇤v
✓v =
qv � q
⇤v
q
⇤v
.
2
disease rate(s) in affected location
before intervention
disease rate(s) in control location
before intervention
high
Estimating the impact of a health intervention
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
qv
�v = qv � q
⇤v
✓v =
qv � q
⇤v
q
⇤v
.
2
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
qv
�v = qv � q
⇤v
✓v =
qv � q
⇤v
q
⇤v
.
2
such that
qvdisease rate(s) in affected location during/after intervention
�v = qv � q
⇤v
absolute difference
✓v =
qv � q
⇤v
q
⇤v
relative difference (impact)
(Lambert & Pregibon, 2008
estimate projected rate(s) in affected location during/after intervention
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
q
⇤v = qcw + b
qv
�v = qv � q
⇤v
2
Estimating the impact of a health intervention
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
qv
�v = qv � q
⇤v
✓v =
qv � q
⇤v
q
⇤v
.
2
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
qv
�v = qv � q
⇤v
✓v =
qv � q
⇤v
q
⇤v
.
2
such that
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
qv
�v = qv � q
⇤v
✓v =
qv � q
⇤v
q
⇤v
.
2
disease rate(s) in affected location during/after intervention
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
qv
�v = qv � q
⇤v
✓v =
qv � q
⇤v
q
⇤v
2
absolute difference
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
qv
�v = qv � q
⇤v
✓v =
qv � q
⇤v
q
⇤v
2
relative difference (impact)
(Lambert & Pregibon, 2008)
estimate projected rate(s) in affected location during/after intervention
k(x,x0) =
CX
n=1
kRQ(gn,g0n)
!+ kN(x,x
0)
where gn is used to express the features of each n-gram category, i.e.,
x = {g1, g2, g3, g4}, C is equal to the number of n-gram categories (in our
experiments, C = 4) and kN(x,x0) = �2
N ⇥ �(x,x0) models noise (� being a
Kronecker delta function).
Denoting the disease rate time series as y = (y1, . . . , yN ), the GP re-
gression objective is defined by the minimization of the following negative
log-marginal likelihood function
argmin
�1,...,�C ,`1,...,`C ,↵1,...,↵C ,�N
�(y � µ)|K�1
(y � µ) + log |K|�
where K holds the covariance function evaluations for all pairs of inputs,
i.e., (K)i,j = k(xi,xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by computing
the mean value of the posterior predictive distribution, E[y⇤|y,O,x⇤].—
⌧ = {t1, . . . , tN}vc
r(q⌧v ,q
⌧c )
f(w,�) : R ! R
argmin
w,�
NX
i=1
�qtic w + � � qtiv
�2
q
⇤v = q
⇤cw + b
q
⇤v = qcw + b
qv
�v = qv � q
⇤v
2
✓ Background and motivation
✓ Estimating disease rates from online text
✓ Estimating the impact of a health intervention
๏ Case study: influenza vaccination impact
๏ Conclusions & future work
Assessing the impact of a health intervention via online content
52%
Live Attenuated Influenza Vaccine (LAIV) campaign
2012 2013 20140
0.01
0.02
0.03
ILI r
ate
per 1
00 p
eopl
e
PHE/RCGP LAIV Post LAIV
∆tv
• LAIV programme for children (4 to 11 years) in pilot areas of England during the 2013/14 flu season
• Vaccination period (blue): Sept. 2013 to Jan. 2014 • Post-‐vaccination period (green): Feb. to April 2014
Target (vaccinated) & control areas
Vaccination impact paper
Vaccinated Locations
Bury
Cumbria
Gateshead
Leicester
East Leicestershire
Rutland
Havering
Newham
South-East Essex
Control Locations
Brighton
Bristol
Cambridge
Exeter
Leeds
Liverpool
Norwich
Nottingham
Plymouth
Sheffield
Southampton
York
Brighton • Bristol • Cambridge Exeter • Leeds • Liverpool Norwich • Nottingham • Plymouth Sheffield • Southampton • York
Control areas
Bury • Cumbria • Gateshead Leicester • East Leicestershire Rutland • South-‐East Essex Havering (London) Newham (London)
Vaccinated areas
Applying the impact estimation framework
Target vs. control areas • Use previous flu season only to establish relationships • Find the best correlated areas or supersets of them
Confidence intervals • Bootstrap sampling of the regression residuals
(mapping function of control to vaccinated areas) • Bootstrap sampling of data prior to the application of
the bootstrapped regressor • 105 bootstraps; use the .025 and .975 quantiles
Statistical significance assessment • Impact estimate (abs.) > 2σ of the bootstrap estimates
Relationship between vaccinated & control areas
Twitter — All areas
Bing — All areas
0 0.25 0.5 0.75 1
0
0.25
0.5
0.75
1
ILI r
ates
in v
acci
nate
d ar
eas
ILI rates in control areas
pre−vaccination periodduring/after LAIV
0 0.25 0.5 0.75 10
0.25
0.5
0.75
1
ILI r
ates
in v
acci
nate
d ar
eas
ILI rates in control areas
pre−vaccination periodduring/after LAIV
axes normalised from 0 to 1
r = .86
r = .87
Relationship between vaccinated & control areas
Twitter — London areas
Bing — London areas
0 0.25 0.5 0.75 10
0.25
0.5
0.75
1
ILI r
ates
in v
acci
nate
d ar
eas
ILI rates in control areas
pre−vaccination periodduring/after LAIV
0 0.25 0.5 0.75 10
0.25
0.5
0.75
1IL
I rat
es in
vac
cina
ted
area
s
ILI rates in control areas
pre−vaccination periodduring/after LAIV
axes normalised from 0 to 1
r = .74
r = .85
Impact estimation results (strongly correlated controls)
Source Target r δ x 103 θ (%)
Twitter All areas .861 -‐2.5 (-‐4.1, -‐1.0) -‐32.8 (-‐47.4, -‐15.6)
Bing All areas .866 -‐1.9 (-‐3.2, -‐0.7) -‐21.7 (-‐32.1, -‐9.10)
TwitterLondon areas .738 -‐1.7 (-‐2.5, -‐0.9) -‐30.5 (-‐41.8, -‐17.5)
BingLondon areas .848 -‐2.8 (-‐4.1, -‐1.6) -‐28.4 (-‐36.7, -‐17.9)
Impact estimation results (strongly correlated controls)
Source Target r δ x 103 θ (%)
Twitter All areas .861 -‐2.5 (-‐4.1, -‐1.0) -‐32.8 (-‐47.4, -‐15.6)
Bing All areas .866 -‐1.9 (-‐3.2, -‐0.7) -‐21.7 (-‐32.1, -‐9.10)
TwitterLondon areas .738 -‐1.7 (-‐2.5, -‐0.9) -‐30.5 (-‐41.8, -‐17.5)
BingLondon areas .848 -‐2.8 (-‐4.1, -‐1.6) -‐28.4 (-‐36.7, -‐17.9)
Source Target r δ x 103 θ (%)
Twitter All areas .861 -‐2.5 (-‐4.1, -‐1.0) -‐32.8 (-‐47.4, -‐15.6)
Bing All areas .866 -‐1.9 (-‐3.2, -‐0.7) -‐21.7 (-‐32.1, -‐9.10)
TwitterLondon areas .738 -‐1.7 (-‐2.5, -‐0.9) -‐30.5 (-‐41.8, -‐17.5)
BingLondon areas .848 -‐2.8 (-‐4.1, -‐1.6) -‐28.4 (-‐36.7, -‐17.9)
Impact estimation results (strongly correlated controls)
Source Target r δ x 103 θ (%)
Twitter All areas .861 -‐2.5 (-‐4.1, -‐1.0) -‐32.8 (-‐47.4, -‐15.6)
Bing All areas .866 -‐1.9 (-‐3.2, -‐0.7) -‐21.7 (-‐32.1, -‐9.10)
TwitterLondon areas .738 -‐1.7 (-‐2.5, -‐0.9) -‐30.5 (-‐41.8, -‐17.5)
BingLondon areas .848 -‐2.8 (-‐4.1, -‐1.6) -‐28.4 (-‐36.7, -‐17.9)
Impact estimation results (strongly correlated controls)
Impact estimation results (stat. sig.)-‐θ (%
)
0
7
14
21
28
35
All areas London areas Newham Cumbria Gateshead
30.228.7
21.7 21.1
30.430.532.8
Twitter Bing
Projected vs. inferred ILI rates in vaccinated locations
Twitter — All areas
Bing — All areas
Oct Nov Dec Jan Feb Mar Apr0
0.005
0.01
0.015
0.02
ILI r
ates
per
100
peo
ple
weeks during and after the vaccination programme
inferred ILI ratesprojected ILI rates
Oct Nov Dec Jan Feb Mar Apr0
0.005
0.01
0.015
0.02
ILI r
ates
per
100
peo
ple
weeks during and after the vaccination programme
inferred ILI ratesprojected ILI rates
Projected vs. inferred ILI rates in vaccinated locations
Twitter — London areas
Bing — London areas
Oct Nov Dec Jan Feb Mar Apr0
0.005
0.01
ILI r
ates
per
100
peo
ple
weeks during and after the vaccination programme
inferred ILI ratesprojected ILI rates
Oct Nov Dec Jan Feb Mar Apr0
0.005
0.01
0.015
ILI r
ates
per
100
peo
ple
weeks during and after the vaccination programme
inferred ILI ratesprojected ILI rates
Sensitivity of impact estimates to variable controls
• Repeat the impact estimation for the N controls (up to a 100) with r ≥ 95% of the best r —> μ(δ) and μ(θ) (%)
• Measure % of difference, Δ(θ), between θ and μ(θ)
Source Target N μ(r) μ(δ) x 103 μ(θ) (%) Δθ (%)
Twitter All areas 100 0.84 -‐2.5 (0.2) -‐32.7 (2.1) 0.10Bing All areas 46 0.85 -‐1.4 (0.4) -‐16.4 (3.6) 24.4
Twitter London areas 79 0.70 -‐1.5 (0.1) -‐27.9 (2.0) 8.32
Bing London areas 100 0.84 -‐1.4 (0.2) -‐16.9 (1.8) 40.4
Sensitivity of impact estimates to variable controls
• Repeat the impact estimation for the N controls (up to a 100) with r ≥ 95% of the best r —> μ(δ) and μ(θ) (%)
• Measure % of difference, Δ(θ), between θ and μ(θ)
Source Target N μ(r) μ(δ) x 103 μ(θ) (%) Δθ (%)
Twitter All areas 100 0.84 -‐2.5 (0.2) -‐32.7 (2.1) 0.10
Bing All areas 46 0.85 -‐1.4 (0.4) -‐16.4 (3.6) 24.4
Twitter London areas 79 0.70 -‐1.5 (0.1) -‐27.9 (2.0) 8.32
Bing London areas 100 0.84 -‐1.4 (0.2) -‐16.9 (1.8) 40.4
Sensitivity of impact estimates to variable controls
• Repeat the impact estimation for the N controls (up to a 100) with r ≥ 95% of the best r —> μ(δ) and μ(θ) (%)
• Measure % of difference, Δ(θ), between θ and μ(θ)
Source Target N μ(r) μ(δ) x 103 μ(θ) (%) Δθ (%)
Twitter All areas 100 0.84 -‐2.5 (0.2) -‐32.7 (2.1) 0.10Bing All areas 46 0.85 -‐1.4 (0.4) -‐16.4 (3.6) 24.4
Twitter London areas 79 0.70 -‐1.5 (0.1) -‐27.9 (2.0) 8.32
Bing London areas 100 0.84 -‐1.4 (0.2) -‐16.9 (1.8) 40.4
✓ Background and motivation
✓ Estimating disease rates from online text
✓ Estimating the impact of a health intervention
✓ Case study: influenza vaccination impact
๏ Conclusions & future work
Assessing the impact of a health intervention via online content
89%
Conclusions & points for discussion
• Framework for estimating the impact of a health intervention based on online content
• Access to different & larger parts of the population
Evaluation is hard, however: • PHE’s impact estimates: -‐66% based on sentinel
surveillance, -‐24% laboratory confirmed • Correlation between actual vaccination uptake and our
study’s estimated impacts
Why are Bing and Twitter estimations different? • Different user demographics (?) — this can be useful • Different temporal resolution
(Pebody et al., 2014)
Potential future work directions
• Improve supervised learning models - better natural language processing / machine
learning modelling - combination of different data sources
• Work on unsupervised techniques - inferring / understanding the demographics of the
online medium will be essential
• More rigorous evaluation
Collaborators, acknowledgements & material
Elad Yom-‐Tov, Microsoft Research Richard Pebody, Public Health England Ingemar J. Cox, UCL & University of Copenhagen
Jens Geyti, UCL (Software Engineer) Simon de Lusignan, University of Surrey & RCGP
Slides: ow.ly/RN7MZPaper: ow.ly/RN9J2
i-‐sense.org.uk
Bollen, Mao & Zeng. Twitter mood predicts the stock market. J Comp Science, 2011. Burger, Henderson, Kim & Zarrella. Discriminating Gender on Twitter. EMNLP, 2011. Choi & Varian. Predicting the Present with Google Trends. Economic Record, 2012. Culotta. Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages. Lang Resour Eval, 2013. Hoerl & Kennard. Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 1970. Lamb, Paul & Dredze. Separating Fact from Fear: Tracking Flu Infections on Twitter. NAACL, 2013. Lambert & Pregibon. Online effects of offline ads. Data Mining & Audience Intelligence for Advertising, 2008. Lampos & Cristianini. Tracking the flu pandemic by monitoring the Social Web. CIP, 2010. Lampos & Cristianini. Nowcasting Events from the Social Web with Statistical Learning. ACM TIST, 2012. Lampos, Miller, Crossan & Stefansen. Advances in nowcasting influenza-‐like illness rates using search query logs. Sci Rep, 2015. Lampos, Yom-‐Tov, Pebody & Cox. Assessing the impact of a health intervention via user-‐generated Internet content. DMKD, 2015. Pebody et al. Uptake and impact of a new live attenuated influenza vaccine programme in England: early results of a pilot in primary school-‐age children, 2013/14 influenza season. Eurosurveillance, 2014. Preotiuc-‐Pietro, Lampos & Aletras. An analysis of the user occupational class through Twitter content. ACL, 2015. Rao, Yarowsky, Shreevats & Gupta. Classifying Latent User Attributes in Twitter. SMUC, 2010. Rasmussen & Williams. Gaussian Processes for Machine Learning. MIT Press, 2006. Tumasjan, Sprenger, Sandner & Welpe. Predicting Elections with Twitter: What 140 characters Reveal about Political Sentiment. ICWSM, 2010. Zou & Hastie. Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol, 2005.
References