Harnessing social signals to enhance a search
-
Upload
ismail-badache -
Category
Social Media
-
view
188 -
download
0
Transcript of Harnessing social signals to enhance a search
Ismaïl BADACHE, Mohand BOUGHANEM
IRIT, Toulouse University, France
{badache, boughanem}@irit.fr
Warsaw, Poland
Presentation Plan
Introduction
Related Work
Approach of Social Information Retrieval
Experimental Results4
1
3
Conclusion
2
5
1.1 Emergence of social Web
1
Number of active users 2013
1,2 1,41,7
2,4
2011 2012 2013 2014
Number of Internet users
Social content per 1 minute
41000 Publications
1,8 Million Like
~350 GB of Data
Face
bo
ok
Source:blogdumoderateur.comquantcast.comsemiocast.com
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
Video
Photo
Web Page
Web Resources
Resource
.
.
.
Social Networks
Bookmark
Comment
Share/Recommend
Motion/Vote
Like/+1
Interaction
Extraction and quantification of
social properties
Information Retrieval Model
(Ranking)
Integration
Query
2
Results
Fig 1. Global presentation of our work
Social Signals
(Source of Evidence)
Popularity
Reputation
Freshness
3
1.2 Example of Social Signals
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
1.3 Research Issues
Can these social data help the search systems for guiding the users to reach a
better quality or more relevant content?2
How effective is each individual social signal for ranking resources for a
given query? What are the ranking correlations created by these social data?3
4
How to combine these social data in form of social properties? What are the
most useful of them to take into account in a model search?4
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
1What happens when a user clicks on like or dislike button or posts a
comment for a resource, say a Web page, photo or video?
Sources of evidence (Social Features) Properties Models Authors
• Number of : clicks, votes, records and
recommendations.
Popularity
Importance
Linear
combination(Karweg et al., 2011)
• Number of : like, dislike, comments on
YouTube.
• The playcount (number of times a user
listens to a track on lastfm)
Importance
Machine
learning
and
Linear
combination
(Chelaru et al., 2012)
(Khodaei et al. 2012)
• Presence of a URL in a tweet. (Alonso et al., 2010)
• Number of retweets.
• Number of annotations (tags).Popularity
Machine
learning
(Yang et al., 2012)
(Hong et al., 2011)
(Pantel et al., 2012)
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
2.1 Related Work
5
• Our IR approach consists of exploiting various and heterogeneous social
signals from different social networks to define social properties to take into
account in retrieval model. We associate to each Web resource a priori relevance
based on these social properties. This relevance is then combined with a classical
topical relevance.
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
3.1 A Modular Approach for Social IR
6
• We assume that resource r can be represented both by a set of textual key-words
𝑟𝑤={𝑤1, 𝑤2, …𝑤𝑛} and a set of social actions (signals) performed on this
resource, 𝑟𝑎={𝑎1, 𝑎2, … 𝑎𝑚}.
• We consider a set X={Popularity, Reputation, Freshness} of 3 social properties
that characterize a resource r. Each property is quantified by a specific actions
group. These properties are considered as a priori knowledge of a resource.
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
3.2 Social Signals and Social Properties
7
Web Resource- Textual key-words
- Social Signals
- Like- +1- Share
- Comment- Dates of actions
Web Resource- Textual key-words
- Social Signals
- Like- +1- Share- Comment- Dates of actions
Reputation
Popularity
Freshness
𝑓𝑥 𝑟, 𝐺 =
𝑖=1, 𝑎𝑖𝑥∈ 𝐴
𝑚
𝐶𝑜𝑢𝑛𝑡 (𝑎𝑖𝑥, 𝑟, 𝐺)
3.1 Proposed Approach
• Popularity: The resource popularity can be estimated according to the rate of
sharing this resource on social networks.
• Reputation: The resource reputation can be estimated based on social activities
that have positive meaning such as Facebook like. Indeed, resource reputation
depends on the degree of users' appreciation on social networks.
The general formula is the following:
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
3.3 Estimation of Popularity and Reputation
8
𝑓𝑥(𝑟, 𝐺)𝑁𝑜𝑟𝑚=𝑓𝑥 𝑟, 𝐺 − 𝑀𝐼𝑁(𝑓𝑥 𝑟, 𝐺 )
𝑀𝐴𝑋 𝑓𝑥 𝑟, 𝐺 − 𝑀𝐼𝑁(𝑓𝑥 𝑟, 𝐺 )
(1)
(2)
3.1 Proposed Approach
• Let 𝑇𝑎𝑖={𝑡1,𝑎𝑖 , 𝑡2,𝑎𝑖 , … 𝑡𝑘,𝑎𝑖} a set of k moments (date) at which action 𝑎𝑖 was
produced. A moment t represents the datetime for each action a of the same type.
• Freshness: We assume that a resource is fresh if recent social signals were
associated with it. For that purpose, we define freshness as follows:
"a date of each social action (e.g., date of comment, date of share) performed on a resource on social networks can be exploited to measure the recency of these social
actions, hence the freshness of information".
Its formula is the following:
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
3.4 Estimation of Freshness
9
𝑓𝐹 𝑟, 𝐺 =1
1𝑚 𝑖=1𝑚 (1𝑘 𝑗=1𝑘 𝑇𝑖𝑚𝑒(𝑡𝑗,𝑎𝑖 , 𝑟, 𝐺))
(3)
3.1 Proposed Approach
• The combination of topical relevance with social relevance is given by the
following formula:
• Social Score: Regarding the social score 𝑅𝑒𝑙𝑆(𝑞, 𝑟, 𝐺), we specify that this
score takes into account these social properties, which are in the form of three
normalized factors that are combined linearly by the following formula:
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
Score of Topical
Relevance
Score of Social
Relevance
𝑅𝑒𝑙 𝑞, 𝑟, 𝐺 = α ∙ 𝑅𝑒𝑙𝑇(𝑞, 𝑟) + (1 - α) ∙ 𝑅𝑒𝑙𝑆(𝑞, 𝑟, 𝐺)
Freshness
𝑅𝑒𝑙𝑆 𝑞, 𝑟, 𝐺 = β ∙ 𝑓𝐹(𝑟, 𝐺) + λ ∙ 𝑓𝑃(𝑟, 𝐺) + δ ∙ 𝑓𝑅(𝑟, 𝐺)
Popularity Reputation
3.5 First Method : Linear Combination
10
(4)
(5)
3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
3.6 Second Method : Machine Learning Models
11
Original
DatasetTraining Dataset
Attribute Selection
Algorithms
- WrapperSubsetEval1
- CfsSubsetEval1
- ReliefFAttributeEval2
- SVMAttributeEval3
Learning Algorithms
- Naïve Bayes1
- J482
- SVM3
Cross-Fold
Evaluation
Repeat 5 x for 5-Fold Cross Validation
Fig 2. Machine Learning Process
Topical model results
for all topics
3.1 Proposed Approach
• Objectives
1. Studying the impact of each individual integration of social signals on the
performance of retrieval process.
2. Studying the impact of combining these social signals as social properties.
3. Studying the ranking correlation between social signals and relevance.
• Evaluation challenge
1. Absence of a standard framework for evaluation in social IR.
2. Collect social signals from 5 social networks and mount experimentation.
1. Introduction 2. Related Work
5. Conclusion
4.1 Experimental Evaluation
12
3. Approach of SIR
4. Experimental Results
3.1 Proposed Approach
• Textual Content: 32706 Documents Film in English extracted from IMDb.
• Social Content: 8 social data from 5 social networks.
1. Introduction 2. Related Work
5. Conclusion
4.2 Description of DataSet
13
3. Approach of SIR
4. Experimental Results
ID Title Year Released Runtime Genre Director Writer Actors Plot Poster url
- indexed indexed indexed indexed indexed indexed indexed indexed indexed - -
ACEBOOK
Like
Share
Comment
Date of last action
WITTER
Tweet
GOOGLE+
+1
Share
LINKEDDELICIOUS
Bookmark
3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
4.3 Quantifying of Social Properties
14
3. Approach of SIR
4. Experimental Results
Social Properties Social Signals Social Networks
Popularity P
Number of « Comment » C1 Facebook
Number of « Tweet » C2 Twitter
Number of « Share » C3 LinkedIn
Number of « Share » C4 Facebook
Reputation R
Number of « Like » C5 Google+
Number of « +1 » C6 Facebook
Number of « Bookmark » C7 Delicious
Freshness F Date of last action C8 Facebook
• Each social property is quantified based on social signals according to their
nature and signification.
3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
4.4 Results: Linear Combination
15
3. Approach of SIR
4. Experimental Results
0
0,1
0,2
0,3
0,4
0,5
0,6
Like Share Comment Tweet Mention+1 Share(LIn) Bookmark
Individual Integration of Social Signals
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
Freshness F Reputation R Popularity P R+F P+F P+R All Properties
Different Combinations of Social Signals (Social Properties)0
0,1
0,2
0,3
0,4
BM25 Lucene Model
Baselines (Topical Models)
P@10 P@20 nDCG@10 nDCG@20
Facebook signals
3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
4.5 Results: Machine Learning
16
3. Approach of SIR
4. Experimental Results
Table 1. Selected Social Signals With Attribute Selection Algorithms
++ : Highly selected
+ : Moderately selected
3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
4.5 Results: Machine Learning
17
3. Approach of SIR
4. Experimental Results
Naïve Bayes SVM J48
P@20 0,5105 0,5131 0,689
0,5105 0,5131
0,689
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Naïve Bayes
(CFS)
Naïve Bayes
(WRP)
SVM
(SVM)J48 (RLF)
P@20 0,5315 0,5105 0,5131 0,689
0,5315 0,5105 0,5131
0,689
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Machine learning results with using Attribute
Selection Algorithms
Machine learning without using Attribute
Selection Algorithms
3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
4.6 Results: Ranking Correlation Analysis
18
3. Approach of SIR
4. Experimental Results
Fig 3. Spearman correlation between social signals and relevance
Fig 4. Spearman correlation between social properties and relevance
3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
5. Conclusion
19
3. Proposed Approaches
4. Experimental Results
• Social Information Retrieval Model
- Topical relevance (retrieval model based content only).
- Social relevance (retrieval model based content and social features).
- Attribute selection algorithms and machine learning.
• Experimental Evaluation
- Superiority of proposed approach compared to textual models (baselines).
- Positive ranking correlation between social signals and relevance.
• Perspectives
- Integration of other social features.
- Further study on the impact of the temporal property.
- Comparison of the proposed models with other social models.
- Experimental evaluation on larger dataset.
http://www.irit.fr/~Ismail.Badache/