Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.
-
Upload
marybeth-palmer -
Category
Documents
-
view
213 -
download
0
Transcript of Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.
![Page 1: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/1.jpg)
Web Search
Module 6
INST 734
Doug Oard
![Page 2: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/2.jpg)
Agenda
• The Web
• Crawling
Web search
![Page 3: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/3.jpg)
Query Statistics
Pass, et al., “A Picture of Search,” 2007
![Page 4: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/4.jpg)
Periodic Daily Variation
Pass, et al., “A Picture of Search,” 2007
![Page 5: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/5.jpg)
Pass, et al., “A Picture of Search,” 2007
![Page 6: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/6.jpg)
Pass, et al., “A Picture of Search,” 2007
![Page 7: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/7.jpg)
http://www.google.com/trends/
BurstinessQuery: Earthquake
![Page 8: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/8.jpg)
28% of Queries are Reformulations
Pass, et al., “A Picture of Search,” 2007
![Page 9: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/9.jpg)
Indexing Anchor Text
• A type of “document expansion”– Terms near links describe content of the target
• Works even when you can’t index content– Image retrieval, uncrawled links, …
![Page 10: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/10.jpg)
Estimating Authority from Links
Authority
Authority
Hub
![Page 11: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/11.jpg)
Aggregated Search
![Page 12: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/12.jpg)
Computational Advertizing• Variant of a search problem
– Jointly optimize relevance and revenue
• Triangulating evidence– Queries– Clicks– Page visits
• Obtaining evidence– Advertising networks– Online auctions
![Page 13: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/13.jpg)
Adversarial IR
• Search is user-controlled suppression– Everything is known to the search system– Goal: avoid showing things the user doesn’t want
• Other stakeholders have different goals– Authors risk little by wasting your time– Marketers may hope for serendipitous interest
![Page 14: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/14.jpg)
Influencing the Ranking
• Search Engine Optimization (SEO)– Alter your content to better match user queries– Create partnerships to drive traffic (and links)
• Index Spam– Create bogus user-assigned metadata– Add invisible text (font in background color, …)– Cloaking– Link exchange
![Page 15: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/15.jpg)
Result Filtering (“Safe Search”)
• Source analysis– Whitelists, blacklists
• Content analysis– Text, image analysis, …
• Contextual analysis– Links, user activity, …
![Page 16: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/16.jpg)
Internet Archive
![Page 17: Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling Web search.](https://reader036.fdocuments.in/reader036/viewer/2022062722/56649f285503460f94c4118c/html5/thumbnails/17.jpg)
Web Search Summary
• Different content– Links– User-generated– Crawling
• Different searches– Different goals (transactions, advertising, …)– Temporal variations and burstiness– Different interaction designs