Web Information retrieval (Web IR)

Autumn 2011 1

Web Information retrieval (Web IR)Handout #13:

Ranking based on User BehaviorRanking based on User Behavior

Ali Mohammad Zareh BidokiECE Department, Yazd University

[email protected]

Autumn 2011 2

Finding Ranking Function

• R=f( Query, User behavior, web graph & content features)

• How can we use the user behavior?– Explicit– Implicit

• 80% of user clicks are related to query– Click-through data– From search Engines log

Autumn 2011 3

Click-through data (by Joachims )

c q

r

• Click-through data– Triple (q,r,c)

• q=query• r=ranked list• c=set of clicked docs

Autumn 2011 4

Benefits of Using Click through data

• Democracy in Web• Filling gap between user needs and

results• User clicks are more valuable that a

page content (Search engine precision is evaluated by user no page creators)

• Degree of relevancy between query and documents will increase (Adding click metadata to document)

Autumn 2011 5

Web EntitiesWords

1

2

w

Docs

1

2

n

Docs

1

2

n

Web graph

1

2

m

Users

1

2

q

Queries

Autumn 2011 6

Document Expansion Using Click TD

• First time Google used Anchortext as a document content– Anchor text is view of a document from another

document

Autumn 2011 7

Long term incremental learning

• Di vector of a document in ith iteration• Q is vector of the query that this document is

clicked• Alpha is learning rate

Autumn 2011 8

Naïve Method (NM) A bipartite graph for docs and queries

• Mij is number of clicks on document j for query i

Autumn 2011 9

Naïve Method (Cont.)

• The weight between query qj and document di:

• The meta data for document i is:

mimiii qwqwqwd ...2211

Autumn 2011 10

Co-Visited Method

• If two pages are clicked by the same query they called co-visited.

• The similarity between two docs i and j is (visited(di) shows number of clicks on di and visited(di,dj) shows number of queries in which both are clicked):

Autumn 2011 11

Co-Visited Disadvantages

• It only considers documents similarity (not query similarity)

• As users clicks on top 10 pages, click data are sparse (1.5 queries for each page)– So similarity is not precise

Autumn 2011 12

Iterative Method (IM)

• O(q): set of clicked page for q• Oi(q): the ith clicked page for q• I(d): set of queries in which it is clicked on d• Ii(d): The ith query in which it is clicked on d

Autumn 2011 13

Experimental Results

• Experimental results on a real large query click-through log, i.e. MSN query log data, indicate that the proposed algorithm relatively outperforms – the baseline search system by 157%, – naïve query log mining by 17% and – co-visited algorithm by 17%

• on top 20 precision respectively.

Web Information retrieval (Web IR)

Documents

Transcript of Web Information retrieval (Web IR)