Query Reformulation as a Predictor of Search Satisfaction...Ahmed Hassan, Xiaolin Shi, Nick Craswell...

Query Reformulation as a Predictor of

Search Satisfaction

Ahmed Hassan, Xiaolin Shi, Nick Craswell and Bill Ramsey

Online Satisfaction Measurement

• Satisfying users is the main objective

of any search system

• Measuring user satisfaction is

essential for improving the system

Satisfaction and Implicit Behavior

• How can we model user satisfaction?

– Implicit behavior

• Clicks are the best-known implicit signal

– Clickthrough (e.g., Joachims, 2002, Agichtein et al.

SIGIR’06, Carterette, Jones, NIPS’07, etc.)

– Dwell Time (e.g., Fox et al., TOIS’05)

– Interleaving (e.g., Joachims, KDD’02, Radlinski et al.,

CIKM’08)

Why not Just Use Clicks?

greenfield, mn accident

Time spent on page: 38 seconds


Session Ends

greenfield, mn accident

Woman dies in a fatal accident in greenfield, minnesota


• User performed this search on July 1st

• User was probably looking for


Query Click Query

• User clicked on a result

• The dwell time is long

• But, user was not satsified

Clicks do not always mean satisfaction


Lack of clicks does not always mean dissatisfaction

Weather in san francisco

Query Reformulation

Give Up

Reformulation satisfaction

• What do users do when they do not like the results?

Query Reformulation

• OR:

reformulation satisfaction

Give Up

reformulation search satisfaction

• Another implicit feedback signal that did not receive as much

attention is query reformulation

Query Reformulation

• Query Reformulation is the act of submitting a query to modify

a previous query in hope of retrieving better results

• Reformulations vs. Related Queries

reformulation satisfaction

reformulation search satisfaction

food in san francisco

weather in san francisco

A reformulation

Not a reformulation

Clicks and Reformulation

• Clickthrough Rate (CTR) of different sets of pairs relative to

CTR of all pairs

overall

Short

Long

0% 11% -21%

-29% -17% -39%

25% 24% 29%

Overall Not Similar Similar

Query Similarity

Tim

e D

iff.

• Queries are similar if they share a non-stop-word term

• Queries have short time difference if the difference between their timestamps

is less than 5 minutes



CTR of all pairs

- Similar pairs had 21% below average CTR

- Pairs where Q1 and Q2 are not similar had 11% above average CTR

overall

Short

Long

0% 11% -21%

-29% -17% -39%

25% 24% 29%


Query Similarity

Tim

e D

iff.



CTR of all pairs

- Pairs with short time diff. had 29% below average CTR

- Pairs with long time diff. had 25% above average CTR

overall

Short

Long

0% 11% -21%

-29% -17% -39%

25% 24% 29%


Query Similarity

Tim

e D

iff.



CTR of all pairs

- Similar pairs with short time diff. had 39% below average CTR

- Pairs that are not similar and had long time diff had 24% above average

CTR

overall

Short

Long

0% 11% -21%

-29% -17% -39%

25% 24% 29%


Query Similarity

Tim

e D

iff.

overall

Short

Long

0% 11% -21%

-29% -17% -39%

25% 24% 29%



CTR of all pairs


- Pairs with long time diff. are very similar indicating that query

similarity has little effect if the time between queries is large

Query Similarity

Tim

e D

iff.

Approach

• Query Representation

• Query Reformulation Prediction

• Query Success Prediction

– Using clicks only

– Using reformulation only

– Using both clicks and reformulation

Query Representation

• Query Normalization

– Lower-casing

– Replacing runs of whitespaces with a single space

– Word breaking (using a character level n-gram model)

southjeseycraigslist south jesey craigslist

VerizonWireless verizon wireless

Query Representation

• Queries to Keywords

– For a query x = 𝑥1, 𝑥2, … , 𝑥𝑛 , find a mapping x → y ∈ 𝑌𝑛,

where y is a segmentation from the set 𝑌𝑛

– A segment break is introduced whenever the point wise

mutual information (PMI) between two consecutive words

drops below a certain threshold 𝜏.

𝑃𝑀𝐼(𝑥𝑖 , 𝑥𝑖+1) = log𝑝 𝑥𝑖 , 𝑥𝑖+1

𝑝 𝑥𝑖 𝑝 𝑥𝑖+1

Query Keywords

hotels in san francisco hotels in san_francisco

Hyundai roadside assistance phone number hyundai roadside_assistance phone_number

kodak easyshare recharger chord Kodak_easyshare recharger_chord

user reviews for apple ipad user_reviews for apple_ipad

Matching Keywords

• Exact Match

– The two phrases match exactly.

• Approximate Match

– To capture spelling variants and misspelling, we allow two

keywords to match if the Levenshtein edit distance between

them is less than 2.

• Semantic Match

– Using the depth of the Least Common Subsumer (LCS) in

the WordNet hierarchy.

𝑤𝑢𝑝 𝑡𝑖 , 𝑡𝑗 =2 ∗ 𝑑𝑒𝑝𝑡ℎ(𝐿𝐶𝑆)

𝑑𝑒𝑝𝑡ℎ 𝑡𝑖 + 𝑑𝑒𝑝𝑡ℎ(𝑡𝑗)

Query Reformulation Prediction

Textual Features

normalized Levenshtein edit distance

1 if lev > 2, 0 otherwise

num. characters in common starting from the left

num. characters in common starting from the right

num. words in common starting from the left

num. words in common starting from the right

num. words in common

Jaccard distance between sets of words

Adopted from (Jones and Klinkner., CIKM’08)


Keyword Features

num. of “exact match” keywords in common

num. of “approximate match” keywords in common

num. of “semantic match” keywords in common

num. of keywords in Q1

num. of keywords in Q2

num. of keywords in Q1 but not in Q2

num. of keywords in Q2 but not in Q1

1 if Q1 keywords all Q2’s keywords

1 if Q2 keywords all Q1’s keywords


Other Features

time between queries in seconds

time between queries as a binary feature (5 mins, 30

mins, 60 mins, 120 mins)

cosine distance between vectors derived from the first 10

search results for the query terms.

Query Reformulation Performance

72%

74%

76%

78%

80%

82%

84%

86%

88%

Heurisitic Textual Keywords All

Ac

cu

rac

y

- Keyword features outperform textual features

- Best performance when all features are combined

Query Satisfaction Prediction

1 Clicks Only A query Q is successful if it receives at least one

click

2 SAT Clicks Only

A query Q is successful if it receives at least one

long dwell time click (thresholds: 10, 30 and 50

seconds)

3 Reformulation Only

Predict success using reformulation features only

(i.e. assume users will always reformulate their

queries when not successful)

4 Reformulation + Clicks

(classifier)

Train a classifier using both reformulation and click

features.

Results

• Clicks Only method performs poorly

• Many queries that receive a click still end up

being unsuccessful

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Clicks Only Sat Click Only ReformulationOnly

Reformulation +Clicks

Acc

ura

cy

Results

• Accuracy improves when only SAT clicks are

considered

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%



Acc

ura

cy

Results

• Better performance if we use clicks only

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%



Acc

ura

cy

Results

• Best performance when we learn a classifier using both the

reformulation and the click features

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%



Acc

ura

cy

Reformulation Only vs. Reformulation + Clicks

• Reformulation Only achieves high DSAT but low SAT precision

• Reformulation + clicks achieves good performance for both SAT

and DSAT cases

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

ReformulationOnly


Acc

ura

cy

0%

20%

40%

60%

80%

100%

ReformulationOnly


SA

T P

recis

ion

DS

AT

Pre

cis

ion

SA

T P

recis

ion

DS

AT

Pre

cis

ion

Reformulation Behavior and Search Tasks

• Queries in successful tasks


• Queries in unsuccessful tasks


Queries in unsuccessful tasks have higher similarity than

queries in successful tasks

Data from (Hassan et al., CIKM’11)

Conclusions

• We can reliably identify query reformulations

• Query reformulation is a strong predictor of search success

• Best results when using both query reformulation and clicks

• Reformulation behavior differs in successful and

unsuccessful tasks

Thanks !

Ahmed Hassan

[email protected]

Query Reformulation as a Predictor of Search Satisfaction...Ahmed Hassan, Xiaolin Shi, Nick Craswell...

Documents

Transcript of Query Reformulation as a Predictor of Search Satisfaction...Ahmed Hassan, Xiaolin Shi, Nick Craswell...