Computational Techniques for Public Health...

Post on 26-Jun-2020

0 views 0 download

Transcript of Computational Techniques for Public Health...

Computational Techniques

for Public Health Surveillance Scott H. Burton Ph.D. Dissertation Proposal Department of Computer Science Brigham Young University April 26, 2012

Overview

• Problem overview

• Research area overview

▫ Health research in social media

▫ Data mining

Social network analysis

Collective classification

Text mining

• Dissertation proposal

Health is Important

• U.S. 2010 total health expenditures:

▫ $2.6 trillion (17.9% of GDP)

• Millions of lives affected each year

National Health Expenditures 2010 Highlights.

http://www.cms.gov/NationalHealthExpendData/downloads/highlights.pdf

Image: http://health-ins.us/

Public Health Surveillance

“Public health surveillance is the continuous, systematic collection, analysis and interpretation of health-related data needed for the planning, implementation, and evaluation of public health practice.”

– World Health Organization

• Epidemiology

• Health promotion

• Substance abuse prevention

• Public policy

World Health Organization

http://www.who.int/topics/public_health_surveillance/en/

Traditional Methods

• Health Department Labs

• Focus Groups

• Questionnaires

• Clinical Trials

Limitations of Traditional Methods

Traditional Methods

• Cost

• Delay

• Isolated individuals

• Reported vs. actual behavior

• Often small samples

Social Media Opportunities

Traditional Methods Online Social Media

• Cost

• Delay

• Isolated individuals

• Reported vs. actual behavior

• Often small samples

• Inexpensive

• Real-time posting

• Near real-time analysis

• Relational data / social structures

• True feelings and behaviors

• Large samples

• Geo-located

• Reach under-represented countries and groups

Computational Health Science

“Developing computational techniques to build systems or applications to understand and influence individual health

and measure relevant outcomes.”

Computer Science

Sociology Health Science

The CHS Difference

• Community identification

• Data set size

• Relational classification

• Inductive models

• Text mining and automated analysis

Search Query Monitoring

• Influenza outbreak detection

Polgreen, P., Chen, Y., Pennock, D., Nelson, F., and Weinstein, R.

Using Internet Searches for Influenza Surveillance

Clinical Infectious Diseases, 47(11):1443-1448, 2008.

More Outbreak Detection

• Influenza outbreak detection (Ginsberg, et al.)

• 2009 H1N1 Influenza (Brownstein, et al.)

• Listeriosis (Wilson and Brownstein)

• Gastroenteritis and Chickenpox (Pelat, et al.)

Ginsberg, J., Mohebbi, M., Patel, R., Brammer, L., Smolinski, M., and Brilliant, L.

Detecting Influenza Epidemics using Search Engine Query Data.

Nature, 457(7232):1012-1014, 2008.

Brownstein, J. S., et al.

Information Technology and Global Surveillance of Cases of 2009 H1N1 Influenza

New England Journal of Medicine, 362(18):1731-1735, 2010.

Wilson, K. and Brownstein, J.

Early Detection of Disease Outbreaks using the Internet.

Canadian Medical Association Journal, 180(8):829, 2009.

Pelat, C., Turbelin, C., Bar-Hen, A., Flahault, A., and Valleron, A.

More Diseases Tracked by using Google Trends.

Emerging Infectious Diseases, 15(8):1327, 2009.

Health on YouTube

• Immunizations (N=153) (Keelan, et al.)

• Tanning Bed Use (N=72) (Hossler and Conroy)

• Tobacco (N=50) (Freeman and Chapman)

• Stop Smoking (N=191) (Backinger, et al.)

Keelan, J., Pavri-Garcia, V., Tomlinson, G., and Wilson, K.

YouTube as a Source of Information on Immunization: A Content Analysis.

Journal of the American Medical Association, 298(21):2482, 2007.

Hossler, E. and Conroy, M.

YouTube as a Source of Information on Tanning Bed Use.

Archives of Dermatology, 144(10):1395{1396, 2008.

Freeman, B. and Chapman, S.

Is “YouTube” Telling or Selling you Something? Tobacco Content on the YouTube Video-sharing Website.

Tobacco Control, 16(3):207, 2007.

Backinger, C. L., Pilsner, A. M., Augustson, E. M., Frydl, A., Phillips, T., and Rowden, J.

YouTube as a Source of Quitting Smoking Information.

Tobacco Control, 20(2):119-122, 2011.

Health on Facebook

• General Non-Communicable Disease Groups (N=757)

▫ Farmer, et al.

• Diabetes Groups (N=15)

▫ Greene, et al.

• Ethical Issues (N=202)

▫ Moubarak, et al.

Greene, J., Choudhry, N., Kilabuk, E., and Shrank, W.

Online Social Networking by Patients with Diabetes: A Qualitative Evaluation of Communication with Facebook.

Journal of General Internal Medicine, 26:287-292, 2011.

Moubarak, G., Guiot, A., Benhamou, Y., Benhamou, A., and Hariri, S.

Facebook Activity of Residents and Fellows and its Impact on the Doctor-Patient Relationship.

Journal of Medical Ethics, 37(2):101-104, 2011.

Farmer, A. D., Bruckner Holt, C. E. M., Cook, M. J., and D., H. S.

Social Networking Sites: A Novel Portal for Communication.

Postgraduate Medical Journal, 85:455-459, 2009.

Health on Blogs

• Health-related Blogs (N=951)

▫ Miller and Pole

• Breastfeeding and Blogging (32 blogs, 354 posts, 881 comments)

▫ West et al.

Miller, E. and Pole, A.

Diagnosis Blog: Checking up on Health Blogs in the Blogosphere.

American Journal of Public Health, 100(8):1514-1519, 2010.

West, J., Hall, P., Hanson, C., Thackeray, R., Barnes, M., Neiger, B., and McIntyre, E.

Breastfeeding and Blogging: Exploring the Utility of Blogs to Promote Breastfeeding.

American Journal of Health Education, 42(2):106-115, 2011.

Health on Twitter

• Dental Pain (N=772)

▫ Heaivilin, et al.

• Tobacco (N=5.9 million tweets, 5,000 tobacco-related)

▫ Prier, et al.

• Problem Drinking (N=5.5 million tweets, 21,000 alcohol-related)

▫ West et al.

Heaivilin, N., Gerbert, B., Page, J., and Gibbs, J.

Public Health Surveillance of Dental Pain via Twitter.

Journal of Dental Research, 90(9):1047-1051, 2011.

Prier, K. W., Smith, M. S., Giraud-Carrier, C., and Hanson, C. L.

Identifying Health-Related Topics on Twitter: An Exploration of Tobacco-related Tweets as a Test Topic.

In Proceedings of the 4th International Conference on Social Computing,

Behavioral-Cultural Modeling, and Prediction, pages 18-25. 2011.

West, J., Hall, P., Prier, K., Hanson, C., Giraud-Carrier, C., Neeley, S., Barnes, M.

Temporal Variability of Problem Drinking on Twitter

Open Journal of Preventive Medicine, 2(1):43-48. 2012.

Geo-Location in Twitter

• Pew Institute reports:

▫ 14% of users said they used automatic GPS tagging

• In our study, the data said:

▫ 2.0% of Tweets

▫ 2.7% of unique users

K. Zickuhr and A. Smith.

28% of American Adults Use Mobile and Social Location-based Services.

http://pewinternet.org/~/media//Files/Reports/2011/PIP_Locationbased-services.pdf, 2011.

Burton, S. H., Tanner, K. W., Giraud-Carrier, C. G., West, J. H., and Barnes, M. D.

Right Time, Right Place Health Communication in Twitter: How Good Is Location Information?

In Submission.

Tweets Around the World

Burton, S. H., Tanner, K. W., Giraud-Carrier, C. G., West, J. H., and Barnes, M. D.

Right Time, Right Place Health Communication in Twitter: How Good Is Location Information?

In Submission.

Data Mining

• “the process of discovering interesting and useful patterns and relationships in large volumes of data” – Christopher Clifton

• Algorithms

▫ Supervised

▫ Unsupervised

• Types of data

▫ Tabular

▫ Relational

▫ Text

Clifton, C.

Encyclopedia Britannica: Data Mining

http://www.britannica.com/EBchecked/topic/1056150/data-mining

Social Network Analysis

• Relational data

• Not just networks of “people”

Wasserman, S. and Faust, K.

Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.

Scott, J.

Social Network Analysis: A Handbook. Sage Publications, Second Edition, 2000.

Community Mining

• “Dense subnetwork within a larger network”

Newman, M. E. J.

Communities, Modules and Large-scale Structure in Networks.

Nature Physics, 8:25-31. 2012

Community Mining Techniques

• Label Propagation

▫ Cordasco and Gargano

• Random Walks

▫ Rosvall and Bergstrom

• Rolling k-Cliques

▫ Palla et al.

Cordasco, G. and Gargano, L.

Community Detection via Semi-Synchronous Label Propagation Algorithms

IEEE International Workshop on Business Applications of Social Network Analysis, 2010

Rosvall, R. and Bergstrom, C. T.

Maps of Random Walks on Complex Networks Reveal Community Structure

Proceedings of the National Academy of Sciences 105(4):1118-1123. 2008

Palla, G., Dereneyi, I., Farkas, I., and Vicsek, T.

Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society

Nature, 435(7043):814-818, 2005.

Modularity

• Actual edges minus expected

• Undirected

• Requires complete graph

Newman, M. E. J. and Girvan, M.

Finding and evaluating community structure in networks.

Physical Review E, 69(2):026113, Feb 2004.

Modularity Challenges

• Algorithm efficiency

• Varying sizes

• Overlapping

• Directed graphs

• Local discovery

Directed Community Mining

• Lost information by ignoring direction

• Directed Modularity

▫ Leicht and Newman

• Random Walks

▫ Kim, et al.

Leicht, E. A. and Newman, M. E. J.

Community Structure in Directed Networks.

Physical Review Letters, 100(11):118703, 2008.

Kim, Y., Son, S.-W., Jeong, H.

Finding Communities in Directed Networks

Physical Review E, 81(1):016103, 2010.

Clauset’s Local Modularity

• Steepness of boundary

• Greedily add nodes

Clauset, A.

Finding Local Community Structure in Networks.

Physical Review E, 72(2):026132, Aug 2005.

Collective Classification

• “Typical” classification

▫ Internal attributes

• Relational classification

▫ Neighbor classes

• Collective classification

▫ Both

Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and Eliassi-Rad, T.

Collective Classification in Network Data.

AI Magazine, 29(3):93, 2008.

Jensen, D., Neville, J., and Gallagher, B.

Why Collective Inference Improves Relational Classification.

In Proceedings of the International Conference on Knowledge Discovery and Data Mining, 2004.

Inferring Properties from Friends

• Location

▫ Backstrom, et al.

• Private information (politics, religion, etc.)

▫ Lindamood, et al.

Backstrom, L., Sun, E., and Marlow, C.

Find Me if You Can: Improving Geographical Prediction with Social and Spatial Proximity.

In Proceedings of the 19th International World Wide Web Conference, pages 61-70. 2010.

Lindamood, J., Heatherly, R., Kantarcioglu, M., and Thuraisingham, B.

Inferring private information using social network data.

In Proceedings of the 18th International World Wide Web Conference, pages 1145-1146. 2009.

Text Classification

• Different classes of documents

• Learn patterns from the words in each class

Sebastiani, F.

Machine Learning in Automated Text Categorization.

ACM Computing Surveys, 34(1):1-47, 2002.

Lorem

ipsum

sit

doler.

Etc.

Etc.

Lorem

ipsum

sit

doler.

Etc.

Etc.

Lorem

ipsum

sit

doler.

Etc.

Etc.

Lorem

ipsum

sit

doler.

Etc.

Etc.

Lorem

ipsum

sit

doler.

Etc.

Etc.

Lorem

ipsum

sit

doler.

Etc.

Etc.

Lorem

ipsum

sit

doler.

Etc.

Etc.

Lorem

ipsum

sit

doler.

Etc.

Etc.

Lorem

ipsum

sit

doler.

Etc.

Etc.

Text Classification Algorithms

• Naïve Bayes

▫ McCallum and Nigam, 1998

• k-Nearest Neighbor

▫ Yang, 1999

• Support Vector Machines

▫ Joachims, 1998

• Rule-learning

▫ Cohen and Singer, 1996

• Maximum Entropy

▫ Nigam, et al., 1999

Topic Modeling

• Latent Dirichlet allocation (LDA)

▫ User chooses a topic (z)

▫ Given the topic, user chooses a word

Blei, D. M., Ng, A. Y., and Jordan, M. I.

Latent Dirichlet Allocation.

Journal of Machine Learning Research, 3:993-1022, March 2003.

Labeled LDA

• Supervised LDA

• Incorporates a document label

Ramage, D., Hall, D., Nallapati, R., and Manning, C.

Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-Labeled Corpora

In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 248-256

“Author-LDA”

• Small document challenges for LDA

• One approach:

▫ Combine all of an author’s tweets

Hong, L. and Davison, B.

Empirical Study of Topic Modeling in Twitter.

In Proceedings of the First Workshop on Social Media Analytics, pages 80-88. 2010.

Zhao, W., Jiang, J., Weng, J., He, J., Lim, E., Yan, H., and Li, X.

Comparing Twitter and Traditional Media using Topic Models.

In Proceedings of the 33rd European Conference on Advances in Information Retrieval, pages 338-349. 2011.

Ailment Topic Aspect Model (ATAM)

• Looking for specific health ailments in Twitter

• For each ailment:

▫ General words

▫ Symptoms

▫ Treatments

Paul, M. and Dredze, M.

You are what you Tweet: Analyzing Twitter for Public Health.

In International AAAI Conference on Weblogs and Social Media (ICWSM), 2011.

Identifying Questions in Micro-Text

• Survey (N=624), Questions characterization

▫ Morris, et al.

• I wonder, I’d like to know, etc.

▫ Efron and Winget

• Part of Speech Tagging

▫ Dent and Paul

Dent, K. and Paul, S.

Through the Twitter Glass: Detecting Questions in Micro-text.

In Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011.

Efron, M. and Winget, M.

Questions are Content: A Taxonomy of Questions in a Microblogging Environment.

In Proceedings of the American Society for Information Science and Technology, 47(1):1-10, 2010.

Morris, M. R., Teevan, J., and Panovich, K.

What do People Ask their Social Networks, and Why?: A Survey Study of Status Message Q&A Behavior.

In Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI), pages 1739-1748, 2010.

Questions in Twitter

• Finding questions

▫ Look for “?”

▫ Use Mechanical Turk service

• 1152 Questions

▫ 18% Response rate

Paul, S., Hong, L., and Chi, E.

Is Twitter a Good Place for Asking Questions? A Characterization Study.

In Proceedings of the 5th International Conference on Weblogs and Social Media, pages 578-581, 2011.

Research Area Overview

• Health research in social media

• Data mining

▫ Social network analysis

▫ Collective classification

▫ Text mining

Dissertation Proposal

• Develop and improve computational techniques to better enable public health surveillance in online social media

Public Health Surveillance

in Social Media

Observe

Predict

Discover

Social Media Space

Micro-blogs

Video-sharing

Full-length blogs

Mining Communities

• People in their social structures

• Complete graph not feasible

• Direction matters

Observe

Predict

Discover

Community Mining

• “Dense subnetwork within a larger network”

Newman, M. E. J.

Communities, Modules and Large-scale Structure in Networks.

Nature Physics 8:25-31. 2012

Does Direction Really Matter?

Does Direction Really Matter?

Implications of Discovery

Local, Directed Modularity

Complete Graph Local Discovery

Undirected Modularity • Newman and Girvan (2004)

Local Modularity • Clauset (2005)

Directed Directed Modularity • Leicht and Newman (2008)

Local, Directed Modularity

Clauset’s Local Modularity

• Steepness of boundary

• Greedily add nodes

Clauset, A.

Finding Local Community Structure in Networks.

Physical Review E, 72(2):026132, Aug 2005.

Degrees of Freedom

• Expanding new nodes

▫ Which outside nodes are considered?

• Calculation of local modularity

▫ Which edges to outside nodes count?

▫ Which edges to core nodes count?

Conclusions

• Edge direction is important

• Algorithm extension requires assumptions

• Different assumptions lead to different communities

Public Health Surveillance

in YouTube • What are people:

▫ Sharing?

▫ Seeing?

▫ Saying?

• Implications for communication

Observe

Predict

Discover

YouTube Communities

• Users ▫ Friends

▫ Author – Subscribers

▫ Author – Commenters

▫ Co-commenters

• Videos ▫ Similar titles/keywords

▫ YouTube’s “related videos”

▫ Videos commented on by common users

▫ Videos “in-response-to” others

Burton, S., et al.

Public Health Community Mining in YouTube

In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.

Anti-smoking Communities in YouTube

• “Tobacco Free Florida – Kid Tossing Ball”

Burton, S., et al.

Public Health Community Mining in YouTube

In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.

http://www.youtube.com/watch?v=Ow-D9gCp-UA

Beam Search

• Quickly diverges to other topics

• Depth 4: as many sex-related videos as tobacco

Depth Unique Videos Smoking-related Sex-related

0 1 1 0

1 5 4 1

2 19 9 5

3 70 18 17

4 268 41 42

Total 363 73 65

Burton, S., et al.

Public Health Community Mining in YouTube

In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.

Multiple Sub-Community Expansion

(MSCE) Algorithm 1. Given initial start video

2. Build sub-community

a. Add video most increasing local modularity

b. Continue until no increase

3. Choose next start video based on:

a. Links to existing community

b. Keyword matching

4. Repeat 2-3, until sufficient community built

• Videos more related to the topic than Beam Search (70% vs. 20%)

Burton, S., et al.

Public Health Community Mining in YouTube

In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.

MSCE: Anti-Smoking Video Community

A

B

C

D

A. “Graphic Australian Anti-Smoking Ad”

▫ 2.5 million views

B. “How to quit smoking”

▫ Bridge between 3 sub-communities

C. Superhero sub-community

D. Superhero bridge videos

▫ “Star Wars Anti Smoking Ad”

▫ “Anti-Smoking : Superman versus Nick O’Teen (1981)”

Sampling on YouTube

• Current work:

▫ Search terms

▫ First N results

▫ YouTube limit of 1,000

• Typical users don’t page through search lists

iProspect.com. iProspect Search Engine User Behavior.

Technical report, iProspect.com, Inc., 2006.

Burton, S., et al.

Public Health Community Mining in YouTube

In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.

Conclusions and

Public Health Implications

Conclusion Implication

Users leave health topics within a few clicks One chance to communicate message

Influential authors are involved in the community Simply posting a video is not sufficient

Users with affinities to the topic can be found Surveillance and communication is possible

Communities can be used for sampling Keyword-based approaches can be augmented

Burton, S., et al.

Public Health Community Mining in YouTube

In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.

Horizontal Health Communication

Abroms, L. and Lefebvre, R. C.

Obama's Wired Campaign: Lessons for Public Health Communication.

Journal of Health Communication, 14(5):415-423, 2009

1. Dissemination

2. Feedback

Comparison of Communities

and Information Dissemination

• What health topics are dicussed?

• How do they spread?

Observe

Predict

Discover

Public Health Surveillance

in the Blogosphere • Everyone is a publisher

• Link to other blogs

• Establish credibility

Image: http://datamining.typepad.com/gallery/blog-map-gallery.html

Mommy-Blogs

• Mothers are highly influential in health decisions (Daniel 2009)

• Blog communities influence social norms (Wei 2004)

Daniel, K.

The Power of Mom in Communicating Health.

American Journal of Public Health, 99(12):2119, 2009.

Wei, C.

Formation of Norms in a Blog Community.

Into the Blogosphere: Rhetoric, Community, and Culture in Weblogs. 2004.

Health Topics on Mommy-Blogs

• Community of 450 blogs

Topic Count Percent

Autism 113 0.34

CMV 1 0.00

Down Syndrome 31 0.09

FAS 2 0.01

SIDS 17 0.05

Pregnancy 1,008 3.01

All Entries 33,527 100.00

Parallel Mommy-verses

• Build mommy-communities in Twitter and the Blogosphere

• Evaluate differences

▫ Network structure

▫ Health topics frequency

▫ Likelihood of reiterating

Image: http://www.psychedelicjunction.com/2011/04/what-are-parallel-universes.html

Implications for Health Communication

• Know what is being said

• Identify influential users

▫ Popular/respected

▫ Bridge nodes

• How to best get messages “passed along”

Surveillance of Health

Advice • Do people seek health advice?

• Are they receiving answers?

Observe

Predict

Discover

Is Health Data Too Private?

• Would you post that online?

• Our hypothesis:

▫ People are asking questions and receiving answers

▫ More social capital = Better leverage for advice

Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.

Leveraging Social Networks for Anytime-Anyplace Health Information.

In Submission.

Benefits of Social Media

• No search result list

• Personalization

• Versatility

• Credibility

Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.

Leveraging Social Networks for Anytime-Anyplace Health Information.

In Submission.

Our Study

• Platform: Twitter

▫ Public data

• Health topic: Dental advice

▫ Everyone manages dental health

▫ Not too private

▫ Easy vocabulary

Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.

Leveraging Social Networks for Anytime-Anyplace Health Information.

In Submission.

Mining Dental Advice – Step 1

• Identify dental tweets

▫ Observe all tweets

▫ Filter by:

Tooth, teeth, dental, dentist, gums, molar, moler, floss, toothache

Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.

Leveraging Social Networks for Anytime-Anyplace Health Information.

In Submission.

“Ugh I have the worst tooth ache every…#CantDeal” [sic.]

“I got a massive sweet tooth”

Mining Dental Advice – Step 2

• Identify advice-seeking questions

▫ Look for: “anybody”, “anyone”, “any1” and “?”

▫ Human raters fine-tune

Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.

Leveraging Social Networks for Anytime-Anyplace Health Information.

In Submission.

“Can anyone suggest some home remedies for a #toothache?”

“does anyone know how long it takes for swelling on your mouth to go

down after getting teeth out?”

Mining Dental Advice – Step 3

• Identify answers

▫ Search for: @user-name

▫ Within 48 hours

▫ Verify “in-reply-to” original tweet

Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.

Leveraging Social Networks for Anytime-Anyplace Health Information.

In Submission.

“@Dray_Z try gurgling with warmm salt water or put a tea bag btween

the ones that hurt” [sic.]

Results

• 2 weeks of tweets

▫ 1 million dental tweets (74,000 per day)

▫ 2,035 likely advice seeking (anyone … ?)

▫ 432 genuine advice-seeking

▫ 140 (32%) received at least one response

▫ 5.5 minutes to response (median)

Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.

Leveraging Social Networks for Anytime-Anyplace Health Information.

In Submission.

Benefits of Social Capital

• More like to receive response

• Receive responses faster

Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.

Leveraging Social Networks for Anytime-Anyplace Health Information.

In Submission.

Who is Answering?

• Answers come from people you know

Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.

Leveraging Social Networks for Anytime-Anyplace Health Information.

In Submission.

Relationship Percent

No relation 6.6

Responder following asker 93.0

Asker following responder 70.0

Mutual following and follower 69.5

Conclusions

• People are seeking dental advice in Twitter

• Answers come frequently and quickly

• Users with more social capital are more likely to receive answers

Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.

Leveraging Social Networks for Anytime-Anyplace Health Information.

In Submission.

Predicting Substance Abuse

• Identifying Trends

▫ Content of tweets

▫ Social network

Observe

Predict

Discover

Do People Tweet About That?

“So my family knows I smoke weed. The only one that doesn't really care or seem to concern is my pops” [sic.]

“if u dont like that i smoke weed then u dont like me... Weed is BIG part of my laugh. now pass me the blunt” [sic.]

“No wonder I smoke weed. Stupid people stress me out.”

Mining Process

A. Collect Marijuana Users

B. Collect Non-Marijuana Users

C. Build User Profiles

D. Induce Predictive Model

E. Analyze Model

Intervention (Future Work)

F. Predict Likely Users

Collecting Users

• Keyword filters

• Pilot study: “I smoke weed” (50 users)

▫ 36% - Definitely marijuana users

▫ 25% - Explicitly said it, but possible joking

▫ 19% - At least positive sentiment

▫ 78% - These three combined

• Non-marijuana users

A. Collect Marijuana Users

B. Collect Non-Marijuana Users

C. Build User Profiles

D. Induce Predictive Model

E. Analyze Model

Intervention (Future Work)

F. Predict Likely Users

Building User Profiles

• Complete tweet history (up to 3200)

• Follower List

• Following List

• User-supplied description

A. Collect Marijuana Users

B. Collect Non-Marijuana Users

C. Build User Profiles

D. Induce Predictive Model

E. Analyze Model

Intervention (Future Work)

F. Predict Likely Users

Feature Extraction

• Author-LDA

▫ “day today good time tonight happy”

▫ “real tho man gotta life twitter yo hit”

• Personal pronouns

▫ “My step-mom…”

▫ Bootstrap training set

• Traits from theoretical models

A. Collect Marijuana Users

B. Collect Non-Marijuana Users

C. Build User Profiles

D. Induce Predictive Model

E. Analyze Model

Intervention (Future Work)

F. Predict Likely Users

Hawkins, J., Catalano, R., and Miller, J.

Risk and Protective Factors for Alcohol and Other Drug Problems in Adolescence and

Early Adulthood: Implications for Substance Abuse Prevention.

Psychological Bulletin, 112(1):64, 1992.

The Predictive Model

• Comprehensibility

• Collective classification

▫ Predict personal traits

▫ Predict traits of friends

▫ Weighted, directed edges

A. Collect Marijuana Users

B. Collect Non-Marijuana Users

C. Build User Profiles

D. Induce Predictive Model

E. Analyze Model

Intervention (Future Work)

F. Predict Likely Users

Analysis and Validation

• Compare to theory

▫ “Risk and protective factors”

• Subjective validation

• Objective validation of easily-labeled traits

A. Collect Marijuana Users

B. Collect Non-Marijuana Users

C. Build User Profiles

D. Induce Predictive Model

E. Analyze Model

Intervention (Future Work)

F. Predict Likely Users

Future Work

• Personalized communication

• Intervention

• Communication with family/friends

A. Collect Marijuana Users

B. Collect Non-Marijuana Users

C. Build User Profiles

D. Induce Predictive Model

E. Analyze Model

Intervention (Future Work)

F. Predict Likely Users

Proposed Schedule

Sec. Topic Venue Target

2 Public Health Community Mining in YouTube ACM International Health Informatics

Symposium (IHI)

Published

4 Leveraging Social Networks for Anytime-

Anyplace Health Information

Network Modeling Analysis in Health

Informatics and Bioinformatics

(NetMAHIB)

In Submission

1 Local Community Mining in Directed Graphs Journal of Social Network Analysis and

Mining (SNAM)

June 2012

3 Mining the Spread of Health Content in

Social Media

International Conference on Social

Computing, Behavioral-Cultural

Modeling, and Prediction (SBP)

August 2012

5 Mining Social Media for Trends among

Substance Abusers

ACM Transactions of Knowledge

Discovery from Data (TKDD)

February 2013

Contributions

• Computational techniques

▫ Local, directed community mining

▫ Community mining for sampling

▫ Mining rare and meaningful traits in short text

▫ Combination of text mining and social network

analysis for prediction

• Implications for Health Surveillance

▫ YouTube as a source of communities

▫ Health differences across platforms

▫ Health advice in social media

▫ Prediction of high risk individuals

Observe

Predict

Discover

Questions