Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza...
-
Upload
ernest-baldwin -
Category
Documents
-
view
214 -
download
0
Transcript of Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza...
Internet Search Term Internet Search Term Surveillance for InfluenzaSurveillance for Influenza
Philip M. Polgreen1, ([email protected])
Yiling Chen3,Forrest Nelson2,
David M. Pennock3
Departments of 1Internal Medicine and 2Economics, The University of Iowa, Iowa City, IA;
3Yahoo! Research, New York, NY
DisclosuresDisclosures
Disclosures: PMP: An Influenza Advisory Board Member for Roche
YC and DMP were employees of Yahoo! Research
Funding: RWJF,CDC, NIH
MotivationMotivation
There are multiple surveillance system components for influenza in the U.S. including:
Influenza Mortality from Influenza and PneumoniaInfluenza Like Illness (ILI)Culture Data
However… they all report disease activity after it occurs
The only local (i.e., state level) data is a weekly influenza activity report from each state
MotivationMotivationInfluenza occurs in regular seasonal cycles, but the character and timing of each season varies
Historically, despite the seriousness of the disease and the potential benefit from advance warning, forecasts of influenza activity have not been routinely available in the U.S.
MotivationMotivationBenefits of an influenza forecast (even a few weeks in advance) include extra time for:
Preparing for an increased number of patients admitted for influenza complications
Administering prophylactic medications to persons in high-risk groups
Vaccinating high-risk individuals and healthcare workers
MotivationMotivationThe Internet is an increasingly important source for medical information
Patients/Families
Medical Providers
Thus, analysis of the volume of internet search traffic may provide information about disease activity over time
An analysis of search terms can produce accurate and useful statistics about the unemployment rate
Ettredge M, Gerdes J, Karuga G. Using web-based search data to predict macroeconomics statistics. Commun ACM, 2005; 48(11):87--92.
GoalsGoals
The purpose of this project was to:
(1) determine the temporal relationship between search terms for influenza and actual disease occurrence
(2) determine if and to what extent an increase in search frequency precedes official measures of influenza activity
(3) explore the feasibility of building a search based prediction market for infectious diseases
MethodsMethodsDe-identified Search query logs were obtained daily from http://search.yahoo.com starting 3/2004
Unique queries originating from the U.S. and containing influenza-related search terms were counted daily
Searches had to include either: FLU or INFLUENZA
Searches were excluded if they included BIRD, AVIAN, or PANDEMIC
We also excluded searches containing SHOT, VACCINATION, VACCINE -- to avoid capturing queries related to influenza vaccination searches
MethodsMethods
Daily search counts were divided by the total number of U.S. searches to get the daily fraction of influenza related searches
We then averaged the fraction over the week for every week of the year
MethodsMethodsInfluenza Surveillance Data from March 2004 to August 2007
1. Weekly Influenza Culture Data: Proportion of Positive cultures
Clinical laboratories throughout the U.S. who are either World Health Organization (WHO) Collaborating Laboratories or National Respiratory and Enteric Virus Surveillance System (NREVSS) laboratories report the total number of respiratory specimens tested and the number positive for influenza types
2. 122 Cities Mortality Reporting System:
Each week participating cities report the total number of death certificates received and also the number which list pneumonia or influenza as the underlying and/or contributing cause of death. Based on the city data, we obtain influenza mortality data for 9 U.S. census regions and the whole county
Searches for Influenza and Positive Influenza Cultures by Week
01
02
03
0
Pe
rcen
tage
of P
ositi
ve In
flue
nza
Cul
ture
s
0.0
000
5.0
001
.00
015
Pe
rcen
tage
of I
nflu
enz
a-R
ela
ted
Sea
rche
s
1-2005 1-2006 1-2007 1-2008
Week-Year
Internet Searches Positive Influenza Cultures
Searches for Influenza and Mortality from Influenza and Pneumonia by Week
400
600
800
100
01
200
Mo
rta
lity
fro
m In
fluen
za a
nd
Pne
umo
nia
0.0
000
5.0
001
.00
015
Pe
rcen
tage
of I
nflu
enz
a-R
ela
ted
Sea
rche
s
1-2005 1-2006 1-2007 1-2008
Week-Year
Internet Searches Mortality
Search and Positive CulturesSearch and Positive Cultures
We fit a linear model to test the predictability of search frequency on percentage of positive influenza cultures:
where t is a time trend (measured in weeks), Ct is rate of positive cultures in week t, and st-x is the search
frequency in week t-x
To determine the appropriate lag, we examined 0-10 (weeks)
Searches and MortalitySearches and Mortality
Using the mortality data, we fit another linear model with the same format:
where
mt is the number of deaths during week t, and all other variables are as defined earlier
To determine the appropriate lag, we examined 0-10 (weeks)
Predicted Values for Positive Influenza Cultures Based on Searches and Actual Values by Week
01
02
03
04
0
1-2005 1-2006 1-2007 1-2008
Week-Year
Predicted Positive Cultures Positive Influenza Cultures
Predicted Values for Mortality from Influenza and Pneumonia Based on Searches and Actual Values by Week
400
600
800
100
01
200
1-2005 1-2006 1-2007 1-2008
Week-Year
Predicted Mortality Mortality
Culture ResultsCulture Results
Positive Influenza Culture Regression Results
X (Lag in weeks) Coefficient:St-x Std. Error t P > |t| R2
0 239636.2 18301.99 13.09 <0.001 0.4672
1 242579.5 18218.11 13.32 <0.001 0.4723
2 239568.6 18487.33
12.96 <0.001 0.4568
3 234749.1 18848.97 12.45 <0.001 0.4356
4 229446.4 19225.16 11.93 <0.001 0.4134
5 223257.3 19628.85 11.37 <0.001 0.3890
6 215900.2 20064.8 10.76 <0.001 0.3618
7 206683.5 20565.4 10.05 <0.001 0.3300
8 195520.6 21118.44 9.26 <0.001 0.2943
9 184502.1 21619.25 8.53 <0.001 0.2610
10 173491.3 22164.1 7.83 <0.001 0.2305
Mortality ResultsMortality Results
Influenza Mortality Regression Results
X (Lag in weeks)Coefficient:St-x Std. Error t P > |t| R2
0 3300788 436385.8 7.56 <0.001 0.2075
1 3810620 415148.2 9.18 <0.001 0.2787
2 4194847 394455.2 10.63 <0.001 0.3418
3 4445665 378633.3 11.74 <0.001 0.3882
4 4604043 367573.4 12.53 <0.001 0.4198
5 4625652 368166.3 12.56 <0.001 0.4229
6 4461079 379889.1 11.74 <0.001 0.3919
7 4314867 390405 11.05 <0.001 0.3649
8 4248610 396362.5 10.72 <0.001 0.3523
9 3992864 410770.2 9.72 <0.001 0.3111
10 3767351 422055.3 8.93 <0.001 0.2765
LimitationsLimitations
With only four years of data, the inferential conclusions that we can make are limited
Some proportion of searches may be generated by news reports and not actual disease activity (celebrity effect)
Other searches might be for related topics that are not related to influenza activity (e.g., influenza vaccination)
LimitationsLimitations
Two U.S. influenza search fraction series: one that excludes vaccination related terms and the other that does not.
LimitationsLimitations
Lack of availability of this data to researchers – privacy and proprietary concerns
The geographic data gleaned from search terms is extracted from IP addresses and may not always represent actual geographic location
We could reproduce our results at a census region level
There is a lack of generally available surveillance data against which to compare search data
Summary & ConclusionsSummary & Conclusions
A temporal association exists between search term frequency and influenza disease activity
Influenza related search term activity seems to precede an increase in influenza culture data by at least 4 weeks, and deaths from pneumonia and influenza by at least 7 weeks
“Search-term surveillance” may provide an inexpensive supplement to more traditional disease-surveillance systems
Future WorkFuture Work
Search term surveillance is not limited to influenza
It could also be used for emerging infectious diseases, re-emerging infectious diseases and also to detect changes in phenomena related to chronic diseases
Search term surveillance of symptom based searches (e.g., diarrhea) may help detect outbreaks if search levels rise above an established baseline
Search Based Prediction Markets (How this experiment started)
Future Directions Future Directions (Search Markets)(Search Markets)
Experimental markets called prediction (or decision) Experimental markets called prediction (or decision) markets are created for the sole purpose of making markets are created for the sole purpose of making forecasts and have been used successfully in a number forecasts and have been used successfully in a number of contextsof contexts
In situations involving uncertainty regarding future events, In situations involving uncertainty regarding future events, markets can be used to aggregate information from markets can be used to aggregate information from various individuals to predict future events (i.e., various individuals to predict future events (i.e., information can be extracted from the prices derived in information can be extracted from the prices derived in experimental markets) experimental markets)
Future DirectionsFuture DirectionsThe Iowa Electronic MarketThe Iowa Electronic Market (the first prediction market) has a consistent track (the first prediction market) has a consistent track record of making more accurate forecasts of political elections than any national record of making more accurate forecasts of political elections than any national poll. For 6 presidential elections, the average prediction error has been under 1.5%, poll. For 6 presidential elections, the average prediction error has been under 1.5%, while opinion polls for those same elections have had an average error of 2.5%.while opinion polls for those same elections have had an average error of 2.5%.
HEWLETT-PACKARD HEWLETT-PACKARD has used experimental markets to forecast the sales of its has used experimental markets to forecast the sales of its printers more accurately than its statisticians.printers more accurately than its statisticians.
ELI LILLYELI LILLY has designed markets to predict which developmental drugs have the has designed markets to predict which developmental drugs have the best chance of advancing though clinical trials.best chance of advancing though clinical trials.
GOOGLEGOOGLE has used markets (based on IEM research) to successfully forecast has used markets (based on IEM research) to successfully forecast product launch dates, new office openings, and other events of strategic product launch dates, new office openings, and other events of strategic importance. importance.
The Iowa Influenza Prediction Market The Iowa Influenza Prediction Market has predicted influenza activity 2-4 weeks has predicted influenza activity 2-4 weeks in advance.in advance.
ProMED-mail Iowa H5N1 MarketProMED-mail Iowa H5N1 Market has predicted the number of human cases of has predicted the number of human cases of avian influenza months in advance. avian influenza months in advance.
Search Based Prediction Search Based Prediction Markets for Health TopicsMarkets for Health Topics
Yahoo Tech Buzz Game: a fantasy (i.e., not real money) prediction market for high-tech products, concepts and trends.
The participants goal was to predict how popular various technologies will be in the future. Popularity or buzz is measured by Yahoo! Search frequency over time.
Predictions were made by buying stock in the products or technologies you believe will succeed, and selling stock in the technologies you think will flop.
In other words, you “put your fantasy dollars where your mouth is.”
Thus, our original (and current goal) is to build a search market for diseases