Milad Shokouhi Microsoft Research Cambridge · Query Trends in Bing Logs Search Solutions 2012...

36
Query Trends in Bing Logs Search Solutions 2012 Milad Shokouhi Microsoft Research Cambridge

Transcript of Milad Shokouhi Microsoft Research Cambridge · Query Trends in Bing Logs Search Solutions 2012...

Query Trends in Bing Logs

Search Solutions 2012

Milad Shokouhi

Microsoft Research Cambridge

Why Modeling Past Matters?

Because, that’s the best predictor for Future!

Also because, it explains present…

…and People

Temporal Patterns in Queries

Typical Head Query (e.g. yahoo mail)

Spike & Go (e.g. Whitney Houston Funeral)

• How to detect the spiking intent quickly?

• How to rank news documents?

Spike and remain (e.g. Kindle Fire)

• How to detect the official page quickly?

• How to index and rank such pages correctly?

Seasonal Queries (e.g. Halloween)

• How to classify seasonal queries?

• How to switch between years?

Classifying Temporal Queries

Spiking Queries

Burst detection

Clustering Queries

Seasonal Queries

Shokouhi, SIGIR’11

Seasonal Switches

Seasonal Switches in User Sessions

Modeling Query Trends by Time-

Series

Application: Time-Sensitive Autosuggest (Auto-Completion)

Auto-Completion

prefix

Auto-Completion Trie

Candidate Scores

Prefix Tree

Keys on edges

Nodes store past queries

Scores are Past Frequencies

V

Snapshot taken: Sunday, Feb 13th 2011

Query frequencies according to Google insight for search

Verizon wireless vs. Valentines day

Michael Kors vs. Michael Phelps

We Suggest Ranking Auto-Completion

Candidates by Predicted Popularity

Time-Sensitive Auto-Completion Ranking

MPC Time-Sensitive

MPC 𝑃 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑞∈𝐶(𝑝) 𝑤(𝑞)

w 𝑞 = 𝑓(𝑞)

𝑓(𝑖)𝑖∈𝑄

TS 𝑃, 𝑡 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑞∈𝐶(𝑝) 𝑤(𝑞|𝑡)

w 𝑞|𝑡 = 𝑓 𝑡 (𝑞)

𝑓 𝑡(𝑖)𝑖∈𝑄

P: Prefix; q: Suggestion; t: Time

Time-Series Forecasting

A time-series is a set of discrete or continuous

observations over time.

Applications

Data modeling

Forecast

Examples

Sales figures

Student enrolment

CO2 rate

Query popularity

Single Exponential Smoothing

The data points are modeled with a weighted average.

𝑦, 𝑦 , 𝑦 : Respectively represent actual, smoothed and predicted values at time t.

λ: Smoothing constant

Forecast:

Double Exponential Smoothing

𝑦, 𝑦 , 𝑦 : Respectively represent actual, smoothed and predicted values at time t

𝜆1, 𝜆2: Smoothing constants

𝐹𝑡: Trend factor at time t

Forecast:

Trend + Seasonality?

Triple Exponential Smoothing

𝑦, 𝑦 , 𝑦 : Respectively represent actual, smoothed and predicted values at time t

𝜆1, 𝜆2, 𝜆3: Smoothing constants

𝐹𝑡: Trend factor at time t

𝑆𝑡: Seasonality factor at time t

τ: Length of seasonal cycle

Forecast:

Spring Flowers

Query

Fre

quency

Big Wins

…More examples

Big Losses: American Idol, Giner Lee

American idol

Ginger lee

Query

Fre

quency

Query

Fre

quency

when to plant tulips vs. when to plant tomatoes

In our SIGIR’12 paper we showed that

Short history is better for prediction

Prediction error and autocompletion

ranking quality are correlated

Conclusions

Freshness matters in search, a lot

There are different type of time-sensitive queries

With enough data, temporal trends can be modeled

accurately

Thanks