Query Reformulation as a Predictor of Search Satisfaction...Ahmed Hassan, Xiaolin Shi, Nick Craswell...
Transcript of Query Reformulation as a Predictor of Search Satisfaction...Ahmed Hassan, Xiaolin Shi, Nick Craswell...
Query Reformulation as a Predictor of
Search Satisfaction
Ahmed Hassan, Xiaolin Shi, Nick Craswell and Bill Ramsey
Online Satisfaction Measurement
• Satisfying users is the main objective
of any search system
• Measuring user satisfaction is
essential for improving the system
Satisfaction and Implicit Behavior
• How can we model user satisfaction?
– Implicit behavior
• Clicks are the best-known implicit signal
– Clickthrough (e.g., Joachims, 2002, Agichtein et al.
SIGIR’06, Carterette, Jones, NIPS’07, etc.)
– Dwell Time (e.g., Fox et al., TOIS’05)
– Interleaving (e.g., Joachims, KDD’02, Radlinski et al.,
CIKM’08)
Why not Just Use Clicks?
greenfield, mn accident
Time spent on page: 38 seconds
Why not Just Use Clicks?
Session Ends
greenfield, mn accident
Woman dies in a fatal accident in greenfield, minnesota
Why not Just Use Clicks?
• User performed this search on July 1st
• User was probably looking for
Why not Just Use Clicks?
Query Click Query
• User clicked on a result
• The dwell time is long
• But, user was not satsified
Clicks do not always mean satisfaction
Why not Just Use Clicks?
Lack of clicks does not always mean dissatisfaction
Weather in san francisco
Query Reformulation
Give Up
Reformulation satisfaction
• What do users do when they do not like the results?
Query Reformulation
• OR:
reformulation satisfaction
Give Up
reformulation search satisfaction
• Another implicit feedback signal that did not receive as much
attention is query reformulation
Query Reformulation
• Query Reformulation is the act of submitting a query to modify
a previous query in hope of retrieving better results
• Reformulations vs. Related Queries
reformulation satisfaction
reformulation search satisfaction
food in san francisco
weather in san francisco
A reformulation
Not a reformulation
Clicks and Reformulation
• Clickthrough Rate (CTR) of different sets of pairs relative to
CTR of all pairs
overall
Short
Long
0% 11% -21%
-29% -17% -39%
25% 24% 29%
Overall Not Similar Similar
Query Similarity
Tim
e D
iff.
• Queries are similar if they share a non-stop-word term
• Queries have short time difference if the difference between their timestamps
is less than 5 minutes
Clicks and Reformulation
• Clickthrough Rate (CTR) of different sets of pairs relative to
CTR of all pairs
- Similar pairs had 21% below average CTR
- Pairs where Q1 and Q2 are not similar had 11% above average CTR
overall
Short
Long
0% 11% -21%
-29% -17% -39%
25% 24% 29%
Overall Not Similar Similar
Query Similarity
Tim
e D
iff.
Clicks and Reformulation
• Clickthrough Rate (CTR) of different sets of pairs relative to
CTR of all pairs
- Pairs with short time diff. had 29% below average CTR
- Pairs with long time diff. had 25% above average CTR
overall
Short
Long
0% 11% -21%
-29% -17% -39%
25% 24% 29%
Overall Not Similar Similar
Query Similarity
Tim
e D
iff.
Clicks and Reformulation
• Clickthrough Rate (CTR) of different sets of pairs relative to
CTR of all pairs
- Similar pairs with short time diff. had 39% below average CTR
- Pairs that are not similar and had long time diff had 24% above average
CTR
overall
Short
Long
0% 11% -21%
-29% -17% -39%
25% 24% 29%
Overall Not Similar Similar
Query Similarity
Tim
e D
iff.
overall
Short
Long
0% 11% -21%
-29% -17% -39%
25% 24% 29%
Clicks and Reformulation
• Clickthrough Rate (CTR) of different sets of pairs relative to
CTR of all pairs
Overall Not Similar Similar
- Pairs with long time diff. are very similar indicating that query
similarity has little effect if the time between queries is large
Query Similarity
Tim
e D
iff.
Approach
• Query Representation
• Query Reformulation Prediction
• Query Success Prediction
– Using clicks only
– Using reformulation only
– Using both clicks and reformulation
Query Representation
• Query Normalization
– Lower-casing
– Replacing runs of whitespaces with a single space
– Word breaking (using a character level n-gram model)
southjeseycraigslist south jesey craigslist
VerizonWireless verizon wireless
Query Representation
• Queries to Keywords
– For a query x = 𝑥1, 𝑥2, … , 𝑥𝑛 , find a mapping x → y ∈ 𝑌𝑛,
where y is a segmentation from the set 𝑌𝑛
– A segment break is introduced whenever the point wise
mutual information (PMI) between two consecutive words
drops below a certain threshold 𝜏.
𝑃𝑀𝐼(𝑥𝑖 , 𝑥𝑖+1) = log𝑝 𝑥𝑖 , 𝑥𝑖+1
𝑝 𝑥𝑖 𝑝 𝑥𝑖+1
Query Keywords
hotels in san francisco hotels in san_francisco
Hyundai roadside assistance phone number hyundai roadside_assistance phone_number
kodak easyshare recharger chord Kodak_easyshare recharger_chord
user reviews for apple ipad user_reviews for apple_ipad
Matching Keywords
• Exact Match
– The two phrases match exactly.
• Approximate Match
– To capture spelling variants and misspelling, we allow two
keywords to match if the Levenshtein edit distance between
them is less than 2.
• Semantic Match
– Using the depth of the Least Common Subsumer (LCS) in
the WordNet hierarchy.
𝑤𝑢𝑝 𝑡𝑖 , 𝑡𝑗 =2 ∗ 𝑑𝑒𝑝𝑡ℎ(𝐿𝐶𝑆)
𝑑𝑒𝑝𝑡ℎ 𝑡𝑖 + 𝑑𝑒𝑝𝑡ℎ(𝑡𝑗)
Query Reformulation Prediction
Textual Features
normalized Levenshtein edit distance
1 if lev > 2, 0 otherwise
num. characters in common starting from the left
num. characters in common starting from the right
num. words in common starting from the left
num. words in common starting from the right
num. words in common
Jaccard distance between sets of words
Adopted from (Jones and Klinkner., CIKM’08)
Query Reformulation Prediction
Keyword Features
num. of “exact match” keywords in common
num. of “approximate match” keywords in common
num. of “semantic match” keywords in common
num. of keywords in Q1
num. of keywords in Q2
num. of keywords in Q1 but not in Q2
num. of keywords in Q2 but not in Q1
1 if Q1 keywords all Q2’s keywords
1 if Q2 keywords all Q1’s keywords
Query Reformulation Prediction
Other Features
time between queries in seconds
time between queries as a binary feature (5 mins, 30
mins, 60 mins, 120 mins)
cosine distance between vectors derived from the first 10
search results for the query terms.
Query Reformulation Performance
72%
74%
76%
78%
80%
82%
84%
86%
88%
Heurisitic Textual Keywords All
Ac
cu
rac
y
- Keyword features outperform textual features
- Best performance when all features are combined
Query Satisfaction Prediction
1 Clicks Only A query Q is successful if it receives at least one
click
2 SAT Clicks Only
A query Q is successful if it receives at least one
long dwell time click (thresholds: 10, 30 and 50
seconds)
3 Reformulation Only
Predict success using reformulation features only
(i.e. assume users will always reformulate their
queries when not successful)
4 Reformulation + Clicks
(classifier)
Train a classifier using both reformulation and click
features.
Results
• Clicks Only method performs poorly
• Many queries that receive a click still end up
being unsuccessful
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Clicks Only Sat Click Only ReformulationOnly
Reformulation +Clicks
Acc
ura
cy
Results
• Accuracy improves when only SAT clicks are
considered
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Clicks Only Sat Click Only ReformulationOnly
Reformulation +Clicks
Acc
ura
cy
Results
• Better performance if we use clicks only
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Clicks Only Sat Click Only ReformulationOnly
Reformulation +Clicks
Acc
ura
cy
Results
• Best performance when we learn a classifier using both the
reformulation and the click features
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Clicks Only Sat Click Only ReformulationOnly
Reformulation +Clicks
Acc
ura
cy
Reformulation Only vs. Reformulation + Clicks
• Reformulation Only achieves high DSAT but low SAT precision
• Reformulation + clicks achieves good performance for both SAT
and DSAT cases
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
ReformulationOnly
Reformulation +Clicks
Acc
ura
cy
0%
20%
40%
60%
80%
100%
ReformulationOnly
Reformulation +Clicks
SA
T P
recis
ion
DS
AT
Pre
cis
ion
SA
T P
recis
ion
DS
AT
Pre
cis
ion
Reformulation Behavior and Search Tasks
• Queries in successful tasks
Reformulation Behavior and Search Tasks
• Queries in successful tasks
Reformulation Behavior and Search Tasks
• Queries in unsuccessful tasks
Reformulation Behavior and Search Tasks
• Queries in unsuccessful tasks
Reformulation Behavior and Search Tasks
Queries in unsuccessful tasks have higher similarity than
queries in successful tasks
Data from (Hassan et al., CIKM’11)
Conclusions
• We can reliably identify query reformulations
• Query reformulation is a strong predictor of search success
• Best results when using both query reformulation and clicks
• Reformulation behavior differs in successful and
unsuccessful tasks