Information Retrieval IR 4. Plan This time: Index construction.
Web Information retrieval (Web IR)
-
Upload
alma-jordan -
Category
Documents
-
view
33 -
download
2
description
Transcript of Web Information retrieval (Web IR)
![Page 1: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/1.jpg)
Autumn 2011 1
Web Information retrieval (Web IR)Handout #13:
Ranking based on User BehaviorRanking based on User Behavior
Ali Mohammad Zareh BidokiECE Department, Yazd University
![Page 2: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/2.jpg)
Autumn 2011 2
Finding Ranking Function
• R=f( Query, User behavior, web graph & content features)
• How can we use the user behavior?– Explicit– Implicit
• 80% of user clicks are related to query– Click-through data– From search Engines log
![Page 3: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/3.jpg)
Autumn 2011 3
Click-through data (by Joachims )
c q
r
• Click-through data– Triple (q,r,c)
• q=query• r=ranked list• c=set of clicked docs
![Page 4: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/4.jpg)
Autumn 2011 4
Benefits of Using Click through data
• Democracy in Web• Filling gap between user needs and
results• User clicks are more valuable that a
page content (Search engine precision is evaluated by user no page creators)
• Degree of relevancy between query and documents will increase (Adding click metadata to document)
![Page 5: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/5.jpg)
Autumn 2011 5
Web EntitiesWords
1
2
w
Docs
1
2
n
Docs
1
2
n
Web graph
1
2
m
Users
1
2
q
Queries
![Page 6: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/6.jpg)
Autumn 2011 6
Document Expansion Using Click TD
• First time Google used Anchortext as a document content– Anchor text is view of a document from another
document
![Page 7: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/7.jpg)
Autumn 2011 7
Long term incremental learning
• Di vector of a document in ith iteration• Q is vector of the query that this document is
clicked• Alpha is learning rate
![Page 8: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/8.jpg)
Autumn 2011 8
Naïve Method (NM) A bipartite graph for docs and queries
• Mij is number of clicks on document j for query i
![Page 9: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/9.jpg)
Autumn 2011 9
Naïve Method (Cont.)
• The weight between query qj and document di:
• The meta data for document i is:
mimiii qwqwqwd ...2211
![Page 10: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/10.jpg)
Autumn 2011 10
Co-Visited Method
• If two pages are clicked by the same query they called co-visited.
• The similarity between two docs i and j is (visited(di) shows number of clicks on di and visited(di,dj) shows number of queries in which both are clicked):
![Page 11: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/11.jpg)
Autumn 2011 11
Co-Visited Disadvantages
• It only considers documents similarity (not query similarity)
• As users clicks on top 10 pages, click data are sparse (1.5 queries for each page)– So similarity is not precise
![Page 12: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/12.jpg)
Autumn 2011 12
Iterative Method (IM)
• O(q): set of clicked page for q• Oi(q): the ith clicked page for q• I(d): set of queries in which it is clicked on d• Ii(d): The ith query in which it is clicked on d
![Page 13: Web Information retrieval (Web IR)](https://reader036.fdocuments.in/reader036/viewer/2022072013/56812bb2550346895d8fef5d/html5/thumbnails/13.jpg)
Autumn 2011 13
Experimental Results
• Experimental results on a real large query click-through log, i.e. MSN query log data, indicate that the proposed algorithm relatively outperforms – the baseline search system by 157%, – naïve query log mining by 17% and – co-visited algorithm by 17%
• on top 20 precision respectively.