IR 670 Youtube Search Optimization
description
Transcript of IR 670 Youtube Search Optimization
IR 670 Youtube Search Optimization
Spring 2010 - Prof. Caverlee
Spencer Huang, Jue Yin, Chia-Chun Lin
• When performing a Youtube search, regardless login or not, does not take a Youtube user’s preference into consideration (apply to Google video search also).
• Even if there is such a tool, no idea what’s in the box.
Notations
• VideoEntity(v) - (t in v.title) + (t in v.description) + (t
in v.category) + (t in v.tag)
• UserEntity(u) - (t in u.fav) + (t in u.upload) + (t in
u.subcrption u’) u’, where u’ are terms in u’.fav and u’.upload; u could have multiple u’(s) that u subscribes to.
• User would input Youtube Id and query (at least), which would enable the generation of above entities.
Approach and Feature
• Reorder via Vector-Space with stop words and with result retaining
• Support features from original Youtube/Google video search.
• Add New features such as by author and able/disable restrict content.
Crawl UserEntity
from Youtube Id and
VideoEntity of query
Stop words removal,
construct tf-idf for U and V, cache U for
re-use
Cosine-Similarity and
reorder
Display optimized
result
Results• Interface
Ex: Say my Youtube
Id has 2 favorite
videos….
Results (cont.)
Results (cont.)
Evaluation and Analysis• Conduct user study, ask each user to query and to
rank results’ relevance as baseline. Compare Youtube’s order of relevance vs our Vector approach.
NDCG@50 NDCG@20 NDCG@100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
NDCG
Youtube
Ours w/ subs
Ours w/o subs
Evaluation and Analysis (cont.)
Conclusion and Problems encountered
• The optimization effect is dependant on query term and profile size.
• Youtube caps quota for deployed GAE, Youtube would reject a burst of requests. No problem when running on local GAE.
Possible To do Features and QA
• Support for Asian languages• Interface for feature descriptor: No need
of Youtube account.• Try it out: http://irytso.appspot.com/
End