1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory...

46
1 Mining User Behavior Mining User Behavior Eugene Agichtein Eugene Agichtein Mathematics & Computer Science Mathematics & Computer Science Emory University Emory University

Transcript of 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory...

Page 1: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

11

Mining User BehaviorMining User Behavior

Eugene AgichteinEugene AgichteinMathematics & Computer ScienceMathematics & Computer Science

Emory UniversityEmory University

Page 2: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

22

The Big Picture:The Big Picture:Intelligent Information Intelligent Information AccessAccess

and Question Answering

Patterns in Text (Author

Behavior)

Patterns in Search

(Searcher Behavior)

Structuring Information in

Bio- and Medical text

Discovering Implicit Networks: Entity, Relation,

and Event Extraction

Content Creation and Discovery in

Social Media

Understanding Searcher

Inference and Decision Process

Question Answering

Page 3: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

33

Text Mining for Patient Medical Text Mining for Patient Medical CareCarewith with E. V. GarciaE. V. Garcia (Emory SoM) (Emory SoM) andand A. RamA. Ram (Georgia (Georgia TechTech)) Rule Discovery from Rule Discovery from

Medical Literature Medical Literature (MERLIN project):(MERLIN project):– Identify articles containing useful Identify articles containing useful

clinical knowledgeclinical knowledge– Extract new expert system rules, Extract new expert system rules,

test/modify based on patient DBtest/modify based on patient DB

Personalized diagnosis Personalized diagnosis and care (PRETEX and care (PRETEX project):project):– Extract relevant clinical variables Extract relevant clinical variables

from text in patient recordsfrom text in patient records– Personalize expert system rules for Personalize expert system rules for

a given patient or populationa given patient or population– Automatically identify Automatically identify

harmful drug interactions and harmful drug interactions and side effectsside effects

Page 4: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

44

Mining Textual Data in Patient Electronic Medical Records

Page 5: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

55

More info: Archana Bhattarai et al., poster at reception this More info: Archana Bhattarai et al., poster at reception this eveningevening

Page 6: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

66

Example rule:IF IF LV_stress_perfusion_is_abnormaLV_stress_perfusion_is_abnormallTHEN THEN STRONG POSITIVE EVIDENCESTRONG POSITIVE EVIDENCETHAT THAT Diseased_coronary_is(LAD)Diseased_coronary_is(LAD)

From Medical Literature to Structured Clinical Knowledge

Page 7: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

77

Baoli Li et al., poster at reception this eveningBaoli Li et al., poster at reception this evening

Page 8: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

88

This study claims This study claims WHAT?!?WHAT?!? If it’s printed, must be trueIf it’s printed, must be true

– Published studies are never disprovenPublished studies are never disproven– Experimental study data is never massagedExperimental study data is never massaged

Big Pharma funding Big Pharma funding overstated claims overstated claimsR. Smith, 2005:R. Smith, 2005: Medical journals are an Medical journals are an

extension of the marketing arm of extension of the marketing arm of pharmaceutical companiespharmaceutical companies, PLoS Medicine , PLoS Medicine 

How to evaluate quality/soundness How to evaluate quality/soundness of literature?of literature?

Page 9: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

99

www.falsemed.orgwww.falsemed.org

Page 10: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

1010

ChallengesChallenges

Authority and trustAuthority and trust Privacy of contributors vs. authorityPrivacy of contributors vs. authority Many dimensions of qualityMany dimensions of quality

– Equipment sensitivityEquipment sensitivity– Recency (studies grow obsolete)Recency (studies grow obsolete)– Size of the clinical trialSize of the clinical trial– Correlational vs. controlledCorrelational vs. controlled– RandomizationRandomization– ……

Work in progressWork in progress

Page 11: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

1111

The Big Picture:The Big Picture:Intelligent Information Intelligent Information AccessAccess

and Question Answering

Patterns in Text (Author

Behavior)

Patterns in Search

(Searcher Behavior)

Structuring Information in

Bio- and Medical text

Discovering Implicit Networks: Entity, Relation,

and Event Extraction

Content Creation and Discovery in

Social Media

Understanding Searcher

Inference and Decision Process

Question Answering

Page 12: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

1212

Social mediaSocial media: Planetary-: Planetary-scale user behavior scale user behavior experimentexperiment

Real information needs and Real information needs and subjectivesubjective relevance judgments relevance judgments

Traces of many interactions recordedTraces of many interactions recorded Allows shared, reproducible Allows shared, reproducible

experimentsexperiments Some semantic organization (tags, Some semantic organization (tags,

categories)categories)

Page 13: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

1313

Social Media (emerging)Social Media (emerging)

Page 14: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

1414

Traditional vs. social Traditional vs. social mediamedia

Page 15: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

1515

Page 16: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

1616

Page 17: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

1717

Page 18: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

1818

Page 19: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

1919

Page 20: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

2020

Page 21: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

2121

Page 22: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

2222

Page 23: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

2323

Page 24: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

2424

Page 25: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

2525

Page 26: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

2626

CommunityCommunity

Page 27: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

2727

Page 28: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

2828

Page 29: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

2929

Page 30: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

3030

Page 31: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

3131

Page 32: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

3232

Page 33: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

3333

Page 34: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

3434

How to find How to find relevantrelevant and and high-qualityhigh-quality content in content in social media?social media?

Page 35: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

3535

Learning-based ApproachLearning-based Approach

Content features

Community interaction Features

relevance

Quality

Unified Ranking Function

Page 36: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

3636

Ranking Algorithm – GBrank Ranking Algorithm – GBrank [Zheng 2007][Zheng 2007]

Start with an initial guess Start with an initial guess hh00, for , for kk = 1,2,… = 1,2,… Using Using hhk-1k-1 as the current approximation of as the current approximation of hh, we separate , we separate SS into two into two

disjoint setsdisjoint sets

Fit a regression function Fit a regression function ggkk(x)(x) using Gradient Boosting Tree using Gradient Boosting Tree [Friedman 2001] and the following training data[Friedman 2001] and the following training data

Form the new ranking function asForm the new ranking function as

1 1

1 1

{ , | ( ) ( ) }

{ , | ( ) ( ) }

i i k i k i

i i k i k i

S x y S h x h y

S x y S h x h y

1 1{( , ( ) ), ( , ( ) ) | , }i k i i k i i ix h y y h x x y S

1( ) ( )( )

1k k

k

kh x g xh x

k

Page 37: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

3737

Experimental ResultsExperimental Results

Removing textual features

Removing community interaction features

Baseline

GBrank

Page 38: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

3838

Intelligent Information Intelligent Information AccessAccess

and Question Answering

Patterns in Text (Author

Behavior)

Patterns in Search

(Searcher Behavior)

Structuring Information in

Bio- and Medical text

Discovering Implicit Networks: Entity, Relation,

and Event Extraction

Content Creation and Discovery in

Social Media

Understanding Searcher

Inference and Decision Process

Question Answering

Page 39: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

3939

User Behavior:User Behavior:The 3The 3rdrd Dimension of the Dimension of the WebWeb

Amount exceeds web Amount exceeds web content and content and structurestructure– Published: 4Gb/day; Social Media: 10gb/Day Published: 4Gb/day; Social Media: 10gb/Day – Page views: Page views: 100Gb/day100Gb/day

[Andrew Tomkins, Yahoo! Search, 2007][Andrew Tomkins, Yahoo! Search, 2007]

Page 40: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

4040

Clickthrough for Queries with Clickthrough for Queries with Known Position of Top Relevant Known Position of Top Relevant ResultResult

Relative clickthrough for queries with known relevant results in position 1 and 3 respectively

1 2 3 5 10

Result Position

Re

lati

ve

Cli

ck

Fre

qu

en

cy

All queries

PTR=1

PTR=3

Higher clickthrough at top non-relevant than

at top relevant document

E. Agichtein, E. Brill, and S. Dumais, SIGIR 2006

Page 41: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

4141

Full Search Engine, User Full Search Engine, User Behavior: NDCG, MAPBehavior: NDCG, MAP

  MAP Gain

RN 0.270  

RN+ALL 0.321 0.052 (19.13%)

BM25 0.236  

BM25+ALL 0.292 0.056 (23.71%)

0.56

0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

1 2 3 4 5 6 7 8 9 10K

ND

CG

RNRerank-AllRN+All

Page 42: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

4242

User Behavior Complements User Behavior Complements Content and Web Topology Content and Web Topology

0.45

0.5

0.55

0.6

0.65

0.7

1 3 5 10K

Pre

cis

ion

RNRN+AllBM25BM25+All

Method P@1 Gain

RN (Content + Links) 0.632

RN + All (User Behavior) 0.693 0.061(10%)

BM25 0.525

BM25+All 0.687 0.162 (31%)

Page 43: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

4343

Fine grained behavior analysisFine grained behavior analysis

Page 44: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

4444

1

2

3

4 5

6

78

22

14

15

18

19 20

21Data captured with Tobii eye tracker, courtesy Andy Edmonds, http://www.alwaysbetesting.com/

Page 45: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

4545

Preliminary results on using Preliminary results on using mouse trajectories to infer user mouse trajectories to infer user intentintent

Q. Guo and E. Agichtein, to appear in SIGIR 2008

Page 46: 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

4646

and Question Answering

Patterns in Text (Author

Behavior)

Patterns in Search

(Searcher Behavior)

Structuring Information in

Bio- and Medical text

Discovering Implicit Networks: Entity, Relation,

and Event Extraction

Content Creation and Discovery in

Social Media

Understanding Searcher

Inference and Decision Process

Question Answering

http://www.ir.mathcs.emory.edu/

SummarySummary