Post on 20-Jan-2016
11
Mining User BehaviorMining User Behavior
Eugene AgichteinEugene AgichteinMathematics & Computer ScienceMathematics & Computer Science
Emory UniversityEmory University
22
The Big Picture:The Big Picture:Intelligent Information Intelligent Information AccessAccess
and Question Answering
Patterns in Text (Author
Behavior)
Patterns in Search
(Searcher Behavior)
Structuring Information in
Bio- and Medical text
Discovering Implicit Networks: Entity, Relation,
and Event Extraction
Content Creation and Discovery in
Social Media
Understanding Searcher
Inference and Decision Process
Question Answering
33
Text Mining for Patient Medical Text Mining for Patient Medical CareCarewith with E. V. GarciaE. V. Garcia (Emory SoM) (Emory SoM) andand A. RamA. Ram (Georgia (Georgia TechTech)) Rule Discovery from Rule Discovery from
Medical Literature Medical Literature (MERLIN project):(MERLIN project):– Identify articles containing useful Identify articles containing useful
clinical knowledgeclinical knowledge– Extract new expert system rules, Extract new expert system rules,
test/modify based on patient DBtest/modify based on patient DB
Personalized diagnosis Personalized diagnosis and care (PRETEX and care (PRETEX project):project):– Extract relevant clinical variables Extract relevant clinical variables
from text in patient recordsfrom text in patient records– Personalize expert system rules for Personalize expert system rules for
a given patient or populationa given patient or population– Automatically identify Automatically identify
harmful drug interactions and harmful drug interactions and side effectsside effects
44
Mining Textual Data in Patient Electronic Medical Records
55
More info: Archana Bhattarai et al., poster at reception this More info: Archana Bhattarai et al., poster at reception this eveningevening
66
Example rule:IF IF LV_stress_perfusion_is_abnormaLV_stress_perfusion_is_abnormallTHEN THEN STRONG POSITIVE EVIDENCESTRONG POSITIVE EVIDENCETHAT THAT Diseased_coronary_is(LAD)Diseased_coronary_is(LAD)
From Medical Literature to Structured Clinical Knowledge
77
Baoli Li et al., poster at reception this eveningBaoli Li et al., poster at reception this evening
88
This study claims This study claims WHAT?!?WHAT?!? If it’s printed, must be trueIf it’s printed, must be true
– Published studies are never disprovenPublished studies are never disproven– Experimental study data is never massagedExperimental study data is never massaged
Big Pharma funding Big Pharma funding overstated claims overstated claimsR. Smith, 2005:R. Smith, 2005: Medical journals are an Medical journals are an
extension of the marketing arm of extension of the marketing arm of pharmaceutical companiespharmaceutical companies, PLoS Medicine , PLoS Medicine
How to evaluate quality/soundness How to evaluate quality/soundness of literature?of literature?
99
www.falsemed.orgwww.falsemed.org
1010
ChallengesChallenges
Authority and trustAuthority and trust Privacy of contributors vs. authorityPrivacy of contributors vs. authority Many dimensions of qualityMany dimensions of quality
– Equipment sensitivityEquipment sensitivity– Recency (studies grow obsolete)Recency (studies grow obsolete)– Size of the clinical trialSize of the clinical trial– Correlational vs. controlledCorrelational vs. controlled– RandomizationRandomization– ……
Work in progressWork in progress
1111
The Big Picture:The Big Picture:Intelligent Information Intelligent Information AccessAccess
and Question Answering
Patterns in Text (Author
Behavior)
Patterns in Search
(Searcher Behavior)
Structuring Information in
Bio- and Medical text
Discovering Implicit Networks: Entity, Relation,
and Event Extraction
Content Creation and Discovery in
Social Media
Understanding Searcher
Inference and Decision Process
Question Answering
1212
Social mediaSocial media: Planetary-: Planetary-scale user behavior scale user behavior experimentexperiment
Real information needs and Real information needs and subjectivesubjective relevance judgments relevance judgments
Traces of many interactions recordedTraces of many interactions recorded Allows shared, reproducible Allows shared, reproducible
experimentsexperiments Some semantic organization (tags, Some semantic organization (tags,
categories)categories)
1313
Social Media (emerging)Social Media (emerging)
1414
Traditional vs. social Traditional vs. social mediamedia
1515
1616
1717
1818
1919
2020
2121
2222
2323
2424
2525
2626
CommunityCommunity
2727
2828
2929
3030
3131
3232
3333
3434
How to find How to find relevantrelevant and and high-qualityhigh-quality content in content in social media?social media?
3535
Learning-based ApproachLearning-based Approach
Content features
Community interaction Features
relevance
Quality
Unified Ranking Function
3636
Ranking Algorithm – GBrank Ranking Algorithm – GBrank [Zheng 2007][Zheng 2007]
Start with an initial guess Start with an initial guess hh00, for , for kk = 1,2,… = 1,2,… Using Using hhk-1k-1 as the current approximation of as the current approximation of hh, we separate , we separate SS into two into two
disjoint setsdisjoint sets
Fit a regression function Fit a regression function ggkk(x)(x) using Gradient Boosting Tree using Gradient Boosting Tree [Friedman 2001] and the following training data[Friedman 2001] and the following training data
Form the new ranking function asForm the new ranking function as
1 1
1 1
{ , | ( ) ( ) }
{ , | ( ) ( ) }
i i k i k i
i i k i k i
S x y S h x h y
S x y S h x h y
1 1{( , ( ) ), ( , ( ) ) | , }i k i i k i i ix h y y h x x y S
1( ) ( )( )
1k k
k
kh x g xh x
k
3737
Experimental ResultsExperimental Results
Removing textual features
Removing community interaction features
Baseline
GBrank
3838
Intelligent Information Intelligent Information AccessAccess
and Question Answering
Patterns in Text (Author
Behavior)
Patterns in Search
(Searcher Behavior)
Structuring Information in
Bio- and Medical text
Discovering Implicit Networks: Entity, Relation,
and Event Extraction
Content Creation and Discovery in
Social Media
Understanding Searcher
Inference and Decision Process
Question Answering
3939
User Behavior:User Behavior:The 3The 3rdrd Dimension of the Dimension of the WebWeb
Amount exceeds web Amount exceeds web content and content and structurestructure– Published: 4Gb/day; Social Media: 10gb/Day Published: 4Gb/day; Social Media: 10gb/Day – Page views: Page views: 100Gb/day100Gb/day
[Andrew Tomkins, Yahoo! Search, 2007][Andrew Tomkins, Yahoo! Search, 2007]
4040
Clickthrough for Queries with Clickthrough for Queries with Known Position of Top Relevant Known Position of Top Relevant ResultResult
Relative clickthrough for queries with known relevant results in position 1 and 3 respectively
1 2 3 5 10
Result Position
Re
lati
ve
Cli
ck
Fre
qu
en
cy
All queries
PTR=1
PTR=3
Higher clickthrough at top non-relevant than
at top relevant document
E. Agichtein, E. Brill, and S. Dumais, SIGIR 2006
4141
Full Search Engine, User Full Search Engine, User Behavior: NDCG, MAPBehavior: NDCG, MAP
MAP Gain
RN 0.270
RN+ALL 0.321 0.052 (19.13%)
BM25 0.236
BM25+ALL 0.292 0.056 (23.71%)
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
1 2 3 4 5 6 7 8 9 10K
ND
CG
RNRerank-AllRN+All
4242
User Behavior Complements User Behavior Complements Content and Web Topology Content and Web Topology
0.45
0.5
0.55
0.6
0.65
0.7
1 3 5 10K
Pre
cis
ion
RNRN+AllBM25BM25+All
Method P@1 Gain
RN (Content + Links) 0.632
RN + All (User Behavior) 0.693 0.061(10%)
BM25 0.525
BM25+All 0.687 0.162 (31%)
4343
Fine grained behavior analysisFine grained behavior analysis
4444
1
2
3
4 5
6
78
22
14
15
18
19 20
21Data captured with Tobii eye tracker, courtesy Andy Edmonds, http://www.alwaysbetesting.com/
4545
Preliminary results on using Preliminary results on using mouse trajectories to infer user mouse trajectories to infer user intentintent
Q. Guo and E. Agichtein, to appear in SIGIR 2008
4646
and Question Answering
Patterns in Text (Author
Behavior)
Patterns in Search
(Searcher Behavior)
Structuring Information in
Bio- and Medical text
Discovering Implicit Networks: Entity, Relation,
and Event Extraction
Content Creation and Discovery in
Social Media
Understanding Searcher
Inference and Decision Process
Question Answering
http://www.ir.mathcs.emory.edu/
SummarySummary