Prof. Dr. Haluk UTKU Institute of Nuclear Sciences Hacettepe University Ankara, Turkey
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo!...
-
Upload
johnathan-kennedy -
Category
Documents
-
view
219 -
download
3
Transcript of Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo!...
![Page 1: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/1.jpg)
Contextual Ranking of Keywords Using Click Data
Utku Irmak, Vadim von Brzeski, Reiner Kraft
Yahoo! Inc
ICDE 09’ Datamining session
2010. 04. 09.
Summarized by Park,Sung Eun , IDS Lab., Seoul National University
Presented by Park,Sung Eun ,IDS Lab., Seoul National University
![Page 2: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/2.jpg)
Copyright 2008 by CEBT
Contents
Introduction
Contextual Shortcuts
Concept Ranking Method
Feature Space
Interestingness and Relevance of a Concept
Evaluation
Cross Validation Approach, Editorial Evaluation, Real World Results
Conclusion
2
![Page 3: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/3.jpg)
Copyright 2008 by CEBT
Introduction
Determining and ranking the key concepts in a docu-ment
Goal
Given the candidate set of entities, learn a ranking function which orders the entities by their interestingness and rele-vance
Applications
Contextual advertising systems
Text summarization
User centric entity detection systems
– Detect entities and concepts within text
– Transform those detected entities into actionable like “intelli-gent hyperlinks”
3
![Page 4: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/4.jpg)
Copyright 2008 by CEBT
Contextual Shortcut
4
![Page 5: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/5.jpg)
Copyright 2008 by CEBT
A concept vector
Concepts : A piece of text that refers to an abstract thought or idea. Ex) car insurance, justice
Generating concept vector
– Term vector : TF/IDF from documents in Yahoo! Search
– Unit vector : all units found in the document
Units are constructed from query logs in an iterative statistical ap-proach using the frequencies of the distinct queries
– Concept vector : the term vector and the unit vector are merged
Contextual Shortcut
5
)()(
),(log),(
ypxp
yxpyxI
![Page 6: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/6.jpg)
Copyright 2008 by CEBT
Previous Concept Ranking Method
AG(TF,Unit)
1. A term appears in the term vector, but not in the unit vec-tor
– punish its term vector weight
2. A term appears in the unit vector, but not in the term vec-tor
– its unit weight
3. add this term to the concept vector with its unit weight
– um its term vector and unit vector weights
6
DocumentDocument
Concept AG(TF,Unit) Score Ranking
President bush
1.1549 1
Iraq war 1.1833 2
Political par-ties
0.6147 3
…
![Page 7: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/7.jpg)
Copyright 2008 by CEBT
Proposed Concept Ranking Method
Ranking Function : SVM(Support Vector Machine)
SVMlight : an open source library for ranking SVM
Interestingness : 9 Features of a concept
Relevance: pre-mined terms of the concept
7
Term 1
Term 2
Term 3 Term 3
Term 4 Term 5
Term 7Term 6
… …
Interesting-ness
Relevance Ranking
Con-cept1
I1 R1 1
Con-cept2
I2 R2 2
Con-cept3
I3 R3 3
… … …
TermsFeatures
SVMlight
![Page 8: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/8.jpg)
Copyright 2008 by CEBT
Interestingness of a concept
Category Features Details
Search En-gine Query Logs
Freq exact # of queries received that are exactly same as the concept
Freq phrase contained
# of queries that are exactly same as the con-cept
Unit score The score in the unit vector
Search En-gine Result Pages
Search engine phrase
The number of pages returned to the concept as a query
Text Based Features
Concept size # of terms in the concept
Number of characters
# of characters in the concept
Subconcepts # of subconcepts contained in the concept
Taxanomy High level type If the concept exists in one of the editorially maintained lists, use it as a feature
Others Wiki word count
The length of the Wikipedia articles
8
![Page 9: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/9.jpg)
Copyright 2008 by CEBT
Relevance of a Concept in a Context
A mining approach to obtain a good relevance scoring mechanism
Use pre-mined keywords for each concepts
Relevant terms of
Relevance of the concept can be computed based on the co-occurrence of the pre-mined keyword.
9
)},(),...,,(),,{( 2211 kki stststrmsrelevantTe
},...,,{ 21 ncccC
it
![Page 10: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/10.jpg)
Copyright 2008 by CEBT
Relevance of a Concept in a Context
Relevant term scoring
1. Search engine snippets
– Using Yahoo! Developer Network API
– Treat returned snippets as a document and compute score= tf*idf
– Top m=100 terms based on the score
2. Prisma query refinement tool
– Prisma is a tool which assists users to augment or replace their queries by providing feedback terms by considering the top 50 documents in a large collection based on factors such as count and position of the terms, document rank, occurrence of query terms within the input phrase.
– Construct single document from the concepts returned by Prisma for concept ci and compute the score based on the tf*idf
values
10
![Page 11: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/11.jpg)
Copyright 2008 by CEBT
Relevance of a Concept in a Context
Relevant term scoring
3. Related query suggestions
– Using Yahoo! Developer Network API
– 300 suggestions and the query frequencies of the suggestions
– Say k is the number of term appeared in suggestion lists
11
k
i i termidffreqquerytermScore1
)(*)_ln()(
Snippet
PrismaQuerySuggetions
![Page 12: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/12.jpg)
Copyright 2008 by CEBT
Intuition of Query Suggestion and Prisma
12
![Page 13: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/13.jpg)
Copyright 2008 by CEBT
Evaluation
Cross Validation Approach
Data
– Randomly sampled news stories that were annotated by Con-textual Shortcuts
– The number of times these stories viewed and the number of clicks received by each concept that was detected in the sto-ries
– 870 stoires,6420 concepts of 16549 sample clicks
Weighted Error Rate
Where Click-through-rate=(the number of clicks) / (the number of views)
13
PairsAll
rsedictedPaiMistakenlyRateError
|Pr|
||
||
1
||
1
pairsall
i i
mistakes
i i
differenceCTR
differemceCTRRateErrorWeighted
![Page 14: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/14.jpg)
Copyright 2008 by CEBT
Evaluation
NDCG(Normalized discounted cumulative gain measure)
– A valuable metric for those applications that require high preci-sion at top ranks
– Score for a sorted list of k concepts on documenti
– Where score(j)=bucketNo(CTR(j)/100), bucketNo() returns a bucket number between 0 and 1000 considering all the CTR values observed in the system in increasing order.
14
k
j
jscore
idocument jNNDCG
i1
)(
)1log(
12
![Page 15: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/15.jpg)
Copyright 2008 by CEBT
Evaluation
Interestingness features
15
![Page 16: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/16.jpg)
Copyright 2008 by CEBT
Evaluation
Relevance score
16
![Page 17: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/17.jpg)
Copyright 2008 by CEBT
Evaluation
Interestingness Features and Relevance Score
17
![Page 18: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/18.jpg)
Copyright 2008 by CEBT
Evaluation
Editorial Evaluation
1. Processed set of documents is presented to the judges
2. A judge is asked to select a document from the pool.
3. Ask to read the document and rate each entity or concept highlighted in the document in terms of its interestingness and relevance
18
![Page 19: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/19.jpg)
Copyright 2008 by CEBT
Contributions
We propose to use implicit user feedback in the form of click data to determine the most interesting and relevant concepts in a context via a machine learning approach.
We describe a feature space pertinent to the interesting-ness of a concept, and present algorithms to identify rel-evance of a concept in a given context.
We evaluate the proposed techniques extensively using click data, an editorial study, and an analysis on produc-tion system. The results show significant improvements.
We provide a detailed description of a framework that enables efficient implementation of the proposed tech-niques in a production system.
19
![Page 20: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/20.jpg)
Copyright 2008 by CEBT
Discussion
No theoretical base on their feature selection assump-tions.
No references or base theory at all
Depending on the technology already developed in pre-vious studies.
Huge advantage on having valuable dataset.
20
![Page 21: Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f0e5503460f94c222e2/html5/thumbnails/21.jpg)
Q&A
Thank you
21