Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding...
-
Upload
pamela-hill -
Category
Documents
-
view
221 -
download
0
Transcript of Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding...
![Page 1: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/1.jpg)
TI: AN EFFICIENT INDEXING MECHANISM FOR REAL-TIME
SEARCH ON TWEETSSIGMOD ‘11
C. CHEN ET AL
Pete Bohman
Adam Kunk
![Page 2: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/2.jpg)
Real-Time Search
Definition: A search mechanism capable of finding information in an online fashion as it is produced.Technology belonging to real-time web that
enables users to receive information as soon as it is published
![Page 3: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/3.jpg)
Real-Time Search
In terms of real-time search, what does “online” mean?Online means that a constant stream of
input data is handled as it enters the system, contrary to batch processing
Bing Social Search
![Page 4: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/4.jpg)
Real-Time Search Input Data Example of what kind of input data is
considered for real-time search systems:
twittervision
![Page 5: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/5.jpg)
Real-Time Content Microblogging - Entirely new type of data
1. Short temporal life span
2. Little to no context
3. Simple ideas, fast reporting of events
4. Metadata: time, location, social links
5. Less factual, more opinionated
6. Static posts
7. Furious input rate
8. Often no hyperlink structure, few traditional ranking factors
Current search engines don’t take full advantage of this new data type
![Page 6: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/6.jpg)
Real-Time vs. Conventional Search
Conventional Search RankingRelevance Authority
Real-Time Search RankingRelevanceTemporal immediacy Popularity
![Page 7: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/7.jpg)
Real-Time vs. Conventional Search
Conventional search input Crawl the web periodically and update index
○ Web documents evolveIncapable of crawling and indexing the entire web in
real-time
Real-time search input Stream of data.No need to poll since the posts are static
What can we do with real-time search engines?
![Page 8: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/8.jpg)
User Query Analysis
Collecta real-time search engine Analyzed ~1 Million queries
Continuous Queries○ Monitor events by frequently resubmitting the
same query Different query categories
Conventional Real-Time
Shopping Commerce
Entertainment Travel
Adult Economy
![Page 9: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/9.jpg)
Crowdsourcing Real-Time Data
Crowd sourcing of first hand reports
![Page 10: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/10.jpg)
Value of Real-Time Search The estimated value of real-time search
is around $33 MillionValue derived from types of queries entered
in real-time search systemsUtilized adwords to determine worth of
keywords appearing in queries
![Page 11: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/11.jpg)
Applications of Real-Time Search TwitterStand: Real-time news reports
Example: Coverage of MJ’s death
![Page 12: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/12.jpg)
Applications of Real-Time Search Real-time alert systems
Leverages tweet metadata (time, location) to raise alerts
Earthquake localization based on tweets
![Page 13: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/13.jpg)
Twitter Real-Time Alerts
USGS Twitter Earthquake Detector
![Page 14: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/14.jpg)
Difficulties of Real-Time Search
Two factors:Efficient indexing in order to provide for fast
results
Effective ranking in order to return relevant results
![Page 15: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/15.jpg)
Indexing: RDBMS RDBMS Indexing
Indexes built on columns commonly used in queries
Improves the speed of retrieval operations
![Page 16: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/16.jpg)
Indexing: Conventional Search
Conventional Search (Inverted) IndexingNon structured dataIf a document does not exist in the index, it will not
appear in query results
![Page 17: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/17.jpg)
Indexing: Real-Time Search
Index stream of data Map keywords to tweets containing those
keywords
ChallengeProcessing the stream in a timely manor
○ 5,000 tweets per second
![Page 18: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/18.jpg)
TI Indexing
Not feasible to index every incoming tweet immediately
Selective indexing based on results that are most likely to appear in queriesDistinguished tweets indexed in real-timeNoisy tweets indexed by batch process
![Page 19: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/19.jpg)
TI Tweet Classification
ObservationUsers are only interested in top-K results for
a query Distinguished tweets
Tweet that belongs in the top-K result set of previous query
Noisy tweetThose tweets not appearing in the top-K
results for any of the systems previous queries
![Page 20: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/20.jpg)
TI Indexing
Must limit the size of the query set1.6 Billion twitter queries per day
![Page 21: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/21.jpg)
Query set optimization
Observation20% of queries represent 80% of user
requests
ThereforeZipf’s distribution used statistically limit the
number of queries tweets were compared against
![Page 22: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/22.jpg)
Real-Time Search Ranking How does ranking differ from traditional
web ranking?Typical web search engines rank based on
links to a site, and links from a site (PageRank)
Microblogging data contains social networking links ○ Followers○ Friends○ Re-tweets
![Page 23: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/23.jpg)
Real-Time Search Ranking Ranking is not necessary in RDBMS
systemsIn RDBMS system data is strictly defined
including algebraic operatorsResults are complete not subjective
![Page 24: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/24.jpg)
TI Ranking
Ranking function comprised of:1) User’s PageRank
○ Combination of user weight (defaulted to 1) and how many followers
they have (popularity)
2) Timestamp (self-explanatory)
3) Similarity between tweet and the query
![Page 25: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/25.jpg)
TI Ranking Ranking function also
comprised of:4) Popularity of the topic
Determined by large tweet trees
Popularity of tree is equal to the sum of the U-PageRank values of all tweets in the tree
Tweet Tree Structure
![Page 26: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/26.jpg)
TI Ranking ComparisonTI Rank Vs. Time Rank
![Page 27: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/27.jpg)
What are others doing?
![Page 28: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/28.jpg)
What are others doing?
FacebookReal-Time Feed
![Page 29: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/29.jpg)
Implications
New type of data not currently searchable through existing search enginesNew search tools developed for new data New user search behavior
○ Continuous search results (non-static) Advertisers
○ Chance for more targeted advertisements
![Page 30: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/30.jpg)
Conclusion
TI makes use of two concepts in their real-time search of Twitter:Selective Indexing
○ Form of partial indexing, can’t afford to index every incoming tweet due to large volume of input
Ranking○ Ranking is a known technique, but
microblogging applications provide new ranking algorithms
![Page 31: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/31.jpg)
Conclusion
Real-time search engines must provide:Online algorithms to handle constant input Relevant search results
![Page 32: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/32.jpg)
References TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets
http://www.comp.nus.edu.sg/~ooibc/sigmod11ti.pdf Real Time Search User Behavior
http://faculty.ist.psu.edu/jjansen/academic/jansen_real_time_search.pdf TwitterRank: Finding Topic-Sensitive Influential Twitterers
http://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1503&context=sis_research Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors
http://ymatsuo.com/papers/www2010.pdf TwitterStand: News in Tweets
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.148.1477&rep=rep1&type=pdf Learning Effective Ranking Functions for Newsgroup Search
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.5556&rep=rep1&type=pdf TwitterSearch: A Comparison of Microblog Search and Web Search
http://www.stanford.edu/~dramage/papers/twitter-wsdm11.pdf TwitterVision
http://twittervision.com/ Bing Social
http://www.bing.com/social Reak tune search on the web: Queries, topics, and economic value
http://collecta.com/RealTimeSearch.pdf
![Page 33: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/33.jpg)
Discussion Questions
1) What do you think is the most innovative technique in the TI approach that led to real-time microblog search results?
![Page 34: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/34.jpg)
Discussion Questions
2) Given the partial indexing optimization provided in the paper, how do you think Google could optimize their indexing algorithm in order to capture the newest content on the web?
![Page 35: Pete Bohman Adam Kunk. Real-Time Search Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ebe5503460f94bc8e7f/html5/thumbnails/35.jpg)
Discussion Questions
3) TI makes use of a ranking function in order to select tweets based on various user characteristics. What would you change about the ranking function, if anything?