sdm2008 user behavior - mathcs.emory.edueugene/talks/sdm2008_user_behavior.pdfrules, test/modify...
Transcript of sdm2008 user behavior - mathcs.emory.edueugene/talks/sdm2008_user_behavior.pdfrules, test/modify...
-
11
Mining User BehaviorMining User Behavior
Eugene AgichteinEugene AgichteinMathematics & Computer ScienceMathematics & Computer ScienceEmory UniversityEmory University
-
22
The Big Picture:The Big Picture:Intelligent Information AccessIntelligent Information Access
-
33
Text Mining for Patient Medical CareText Mining for Patient Medical Carewith with E. V. GarciaE. V. Garcia (Emory (Emory SoMSoM) ) andand A. RamA. Ram (Georgia Tech(Georgia Tech))
Rule Discovery from Medical Rule Discovery from Medical Literature (MERLIN Literature (MERLIN project):project):–– Identify articles containing Identify articles containing useful clinical knowledgeuseful clinical knowledge–– Extract new expert system Extract new expert system rules, test/modify based on rules, test/modify based on patient DBpatient DB
Personalized diagnosis and Personalized diagnosis and care (PRETEX project):care (PRETEX project):–– Extract clinical variables from Extract clinical variables from text in patient recordstext in patient records–– Personalize expert system rules Personalize expert system rules for a given patient or populationfor a given patient or population–– Automatically identify harmful Automatically identify harmful drug interactions and side drug interactions and side effectseffects
-
44
Mining Textual Data in Patient Electronic Medical Records
-
55
More info: Archana Bhattarai et al., poster at reception this evMore info: Archana Bhattarai et al., poster at reception this eveningening
-
66
Example rule:IF IF LV_stress_perfusion_is_abnormalLV_stress_perfusion_is_abnormalTHEN THEN STRONG POSITIVE EVIDENCESTRONG POSITIVE EVIDENCETHAT THAT Diseased_coronary_is(LADDiseased_coronary_is(LAD))
From Medical Literature to Structured Clinical Knowledge
-
77
Baoli Li et al., poster at reception this eveningBaoli Li et al., poster at reception this evening
-
88
This study claims WHAT?!?This study claims WHAT?!?�� If itIf it’’s printed, must be trues printed, must be true
–– Published studies are never Published studies are never disprovendisproven–– Experimental study data is never massagedExperimental study data is never massaged
�� Big Big PharmaPharma funding funding �� overstated claimsoverstated claimsR. Smith, 2005:R. Smith, 2005: Medical journals are an extension Medical journals are an extension
of the marketing arm of pharmaceutical of the marketing arm of pharmaceutical companiescompanies, PLoS Medicine, PLoS Medicine
�� How to evaluate quality/soundness of How to evaluate quality/soundness of (medical) scientific literature?(medical) scientific literature?
-
99
www.falsemed.orgwww.falsemed.org
-
1010
ChallengesChallenges�� Authority and trust of contributions, ratings, etc.Authority and trust of contributions, ratings, etc.�� Indicate authority while protecting privacy of Indicate authority while protecting privacy of
contributorscontributors�� Many dimensions of quality (biomedical literature)Many dimensions of quality (biomedical literature)
–– Equipment sensitivityEquipment sensitivity–– RecencyRecency (studies grow obsolete)(studies grow obsolete)–– Size of the clinical trialSize of the clinical trial–– Correlational vs. controlledCorrelational vs. controlled–– Cohort randomizationCohort randomization–– ……
�� Work in progressWork in progress
-
1111
The Big Picture:The Big Picture:Intelligent Information AccessIntelligent Information Access
-
1212
Social mediaSocial media: Planetary: Planetary--scale scale human behavior experimenthuman behavior experiment�� Real information needs and Real information needs and subjectivesubjective
relevance judgmentsrelevance judgments�� Traces of many interactions recordedTraces of many interactions recorded�� Allows shared, reproducible experimentsAllows shared, reproducible experiments�� Some semantic organization (tags, Some semantic organization (tags,
categories)categories)
-
1313
Some ExamplesSome Examples
-
1414
Traditional vs. social mediaTraditional vs. social media
-
1515
-
1616
-
1717
-
1818
-
1919
-
2020
-
2121
-
2222
-
2323
-
2424
CommunityCommunity
-
2525
-
2626
-
2727
-
2828
-
2929
-
3030
-
3131
-
3232
How to find How to find relevantrelevant and and highhigh--qualityquality content in content in social media?social media?
-
3333
LearningLearning--based Approachbased Approach
Content features
Community interaction Features
relevance
Quality
Unified Ranking Function
-
3434
Ranking Algorithm Ranking Algorithm –– GBrank GBrank [[ZhengZheng 2007]2007]�� Start with an initial guess Start with an initial guess hh00, for , for kk = 1,2,= 1,2,……�� Using Using hhkk--11 as the current approximation of as the current approximation of hh, we separate , we separate SS into two into two disjoint setsdisjoint sets
�� Fit a regression function Fit a regression function ggkk(x)(x) using Gradient Boosting Tree using Gradient Boosting Tree [Friedman 2001] and the following training data[Friedman 2001] and the following training data
�� Form the new ranking function asForm the new ranking function as
1 1
1 1
{ , | ( ) ( ) }
{ , | ( ) ( ) }i i k i k i
i i k i k i
S x y S h x h y
S x y S h x h y
ττ
+− −
−− −
= < >∈ ≥ +
= < >∈ < +
1 1{( , ( ) ), ( , ( ) ) | , }i k i i k i i ix h y y h x x y Sτ τ−
− −+ − < >∈
1( ) ( )( )1
k kk
kh x g xh x
k
η− +=+
-
3535
Experimental ResultsExperimental Results
Removing textual features
Removing community interaction features
Baseline
GBrank
-
3636
YouYou’’ve Got Answers!ve Got Answers!Predicting asker satisfactionPredicting asker satisfaction�� Predict user Predict user satisfactionsatisfaction with the answerswith the answers�� Derive additional features for the task, in Derive additional features for the task, in particular prior particular prior askerasker historyhistory�� Can predict with about 75% accuracyCan predict with about 75% accuracy(forthcoming, Liu, Bian, Agichtein, SIGIR 2008)(forthcoming, Liu, Bian, Agichtein, SIGIR 2008)�� Satisfaction Satisfaction subjectivesubjective and and personalpersonal. Even . Even simple personalization models very helpfulsimple personalization models very helpful(forthcoming, Liu and Agichtein, ACL 2008)(forthcoming, Liu and Agichtein, ACL 2008)
-
3737
Intelligent Information AccessIntelligent Information Access
-
3838
User Behavior:User Behavior:The 3The 3rdrd Dimension of the WebDimension of the Web
�� Amount exceeds web Amount exceeds web content and structurecontent and structure–– Published: 4Gb/day; Social Media: 10gb/Day Published: 4Gb/day; Social Media: 10gb/Day –– Page views: Page views: 100Gb/day100Gb/day[Andrew Tomkins, Yahoo! Search, 2007][Andrew Tomkins, Yahoo! Search, 2007]
-
3939
Clickthrough for Queries with Known Clickthrough for Queries with Known Position of Top Relevant ResultPosition of Top Relevant Result
Relative clickthrough for queries with known relevant results in position 1 and 3 respectively
1 2 3 5 10
Result Position
Rel
ativ
e C
lick
Fre
qu
ency
All queries
PTR=1
PTR=3
Higher clickthrough at top non-relevant than at top
relevant document
E. Agichtein, E. Brill, and S. Dumais, SIGIR 2006
-
4040
Full Search Engine, User Behavior: Full Search Engine, User Behavior: NDCG, MAPNDCG, MAP
0.056 (23.71%)0.292BM25+ALL
0.236BM25
0.052 (19.13%)0.321RN+ALL
0.270RN
GainMAP
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
1 2 3 4 5 6 7 8 9 10K
ND
CG
RNRerank-AllRN+All
-
4141
User Behavior Complements Content User Behavior Complements Content and Web Topology and Web Topology
0.45
0.5
0.55
0.6
0.65
0.7
1 3 5 10K
Pre
cisi
on
RNRN+AllBM25BM25+All
0.162 (31%)0.687BM25+All
0.525BM25
0.061(10%)0.693RN + All (User Behavior)
0.632RN (Content + Links)
GainP@1Method
-
4242
Fine grained behavior analysisFine grained behavior analysis
-
4343
12
3
4 5
6
78
22
1415
18
19 2021 Data captured with Tobii eye
tracker, courtesy Andy Edmonds, http://www.alwaysbetesting.com/
-
4444
Preliminary results on using mouse Preliminary results on using mouse trajectories to infer user intenttrajectories to infer user intent
Q. Guo and E. Agichtein, to appear in SIGIR 2008
-
4545http://www.ir.mathcs.emory.edu/
SummarySummary