Detecting temporal patterns of user queriesusers.jyu.fi/~swang/publications/JASIST17.pdf ·...

Detecting Temporal Patterns of User Queries

Pengjie Ren, Zhumin Chen, and Jun Ma

School of Computer Science and Technology, Shandong University, Jinan, 250101, China.

E-mail: [email protected]; [email protected]; [email protected]

Zhiwei Zhang and Luo Si

Department of Computer Science, Purdue University, West Lafayette, IN 47907.

E-mail: [email protected], [email protected]

Shuaiqiang Wang

Department of Computer Science and Information Systems, Jyväskylä University, Agora, 40014, Finland.

E-mail: [email protected]

Query classification is an important part of exploring thecharacteristics of web queries. Existing studies aremainly based on Broder’s classification scheme andclassify user queries into navigational, informational,and transactional categories according to users’information needs. In this article, we present a novelclassification scheme from the perspective of queries’temporal patterns. Queries’ temporal patterns are inher-ent time series patterns of the search volumes of queriesthat reflect the evolution of the popularity of a query overtime. By analyzing the temporal patterns of queries,search engines can more deeply understand the users’search intents and thus improve performance. Further-more, we extract three groups of features based on thequeries’ search volume time series and use a supportvector machine (SVM) to automatically detect the tem-poral patterns of user queries. Extensive experimentson the Million Query Track data sets of the Text REtrievalConference (TREC) demonstrate the effectiveness of ourapproach.

Introduction

There are temporal aspects to web search queries that

search engines need to account for to understand the

changes in user intent and respond to users with temporally

relevant results (Kulkarni, Teevan, Svore, & Dumais, 2011;

Shokouhi, 2011). For example, for the query “Earthquake,”

users usually search for basic knowledge about earthquakes

when no earthquake events are occurring or have just

occurred. Search engines should rank relevant pages from

sites such as Wikipedia at the top of the result list. However,

in the middle of March 2011, the query “Earthquake” sud-

denly became one of the most popular searches.1 The

increase in search volume was caused by the 9.0 magnitude

earthquake in Japan. During this period, most users issuing

this query focused on the event instead of the basic knowl-

edge about earthquakes. Search engines should accordingly

rank the relevant news articles about the event at the top of

the result list. Other examples are queries such as “World

Cup” or “Presidents Cup in Golf,” where the search volumes

of these queries increase and decrease with a fixed cycle.

Correspondingly, the temporal intents of users change peri-

odically. Therefore, it is important for search engines to

correctly identify those queries and ensure that their results

are temporally relevant (Shokouhi, 2011) or diversified

(Berberich & Bedathur, 2013).

By analyzing user queries in depth, we found that the

search volume time series of user queries have certain pat-

terns, which we refer to as queries’ temporal patterns in this

study. Queries’ temporal patterns are the inherent time series

patterns of the volume of queries, which reflect a query’s

popularity over time. Figure 1 presents some classic tempo-

ral pattern instances. Different patterns can reflect different

search intents. For example, queries with flat curve patterns

usually have fixed intents; for example, “Java JDK” in

Figure 1a (Kulkarni et al., 2011). In contrast, queries with

multiple spikes in their curve patterns, such as in Figure 1c

and 1d, may be temporally ambiguous (Shokouhi, 2011).

That is, when users retrieve them, search engines do not

understand which time intervals (generally corresponding

to the spikes on the curves) the users are targeting. ForReceived March 17, 2015; revised May 18, 2015; accepted May 18, 2015

© 2015 ASIS&T • Published online in Wiley Online Library

(wileyonlinelibrary.com). DOI: 10.1002/asi.23578 1http://www.google.cn/trends/explore#q=earthquake

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, ••(••):••–••, 2015

VC 2015 ASIS&T � Published online 18 August 2015 in Wiley Online

Library (wileyonlinelibrary.com). DOI: 10.1002/asi.23578

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 68(1):113–128, 2017

example, for the query “the Olympics” (Figure 1d), there are

five spikes corresponding to the Olympic Games (including

the Summer and Winter Games) that were held in 2004,

2006, 2008, 2010, and 2012. However, search engines do not

understand the periods that users are targeting. In summary,

by analyzing the temporal patterns of user queries, we can

better understand user intents, which can be used to improve

the performance of search engines (Jones & Diaz, 2007;

Kulkarni et al., 2011; Metzler, Jones, Peng, & Zhang, 2009;

Shokouhi, 2011).

In this article, we first introduce a new query-

classification scheme that groups queries into four

categories—stable queries, one-time burst queries, periodic

multitime burst queries, and aperiodic multitime burst

queries—according to their temporal patterns. Then, we

propose an approach to identify a query’s pattern class in

terms of its time series curve. In particular, we first extract a

collection of features by exploring the shapes, trends,

periods, and bursts in the time series curves. Then, we utilize

a support vector machine (SVM) to detect user queries’

temporal patterns. Because there are some queries whose

records in query logs are too sparse to generate valid time

series curves, we present two methods to approximately

construct their time series curves from the document data

sets. We collected a large number of queries from the Text

REtrieval Conference (TREC) and manually annotated their

temporal pattern categories. Extensive experiments indicate

that our approach can significantly outperform the baselines.

The remainder of this article is organized as follows.

First, we present our new query-classification scheme.

Second, we describe our automatic query pattern detection

approach and present the corresponding experiments. Then,

we discuss the potential applications of our study. Next, we

present related work. Finally, we draw some conclusions and

discuss our future work.

Temporal Pattern-Based Query-Classification Scheme

A query’s temporal pattern can be reflected by the

changes in its search volume over time, which is represented

as a time series. Therefore, we group queries into corre-

sponding temporal pattern classes according to the temporal

characteristics of their time series, as shown in Figure 2.

Each leaf node in Figure 2 represents a specific temporal

pattern type. Next, we describe the definitions of these query

classes and typical time series curves, as illustrated in

Figure 1.

Query with time (QwT). A QwT represents the queries that

contain at least one explicit time expression or time interval,

such as “Earthquake 2008 . . . 2010” (Earthquakes from

2008–2010). These queries are easy to detect by identifying

the explicit time expressions. When a user submits a QwT

query, he or she usually has a specific time constraint in

mind.

Query without time (QoT). A QoT denotes the queries that

do not contain any explicit time expression, such as “Web

page rank.” For QoT, we need to detect their temporal pat-

terns to help us analyze user intents.

Stable query (SQ). A SQ denotes the queries in which the

user’s intent does not focus on any specific time interval.

That is, there are no temporal constraints for their results. SQ

FIG. 1. Query temporal pattern examples from Google Trends. The

horizontal and vertical axes represent the time and query search volume,

respectively.

FIG. 2. Temporal pattern-based classification scheme.

2 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015

DOI: 10.1002/asi114 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017

DOI: 10.1002/asi

denotes users’ common, frequent, and constant search

intents. Consequently, their time series curves share a stable

trend (e.g., “Java JDK” as shown in Figure 1a).

Nonstable query (NSQ). An NSQ describes the queries

that represent users’ uncommon, occasional search intents.

One-time burst query (OBQ). An OBQ is a type of NSQ.

OBQs are often triggered by one-time, unexpected events.

As a result, their curves all contain a single spike that occurs

when there is a sudden increase followed by a corresponding

decrease in search volume (e.g., “Japan Earthquake” is an

OBQ, as depicted in Figure 1b).

Multitime burst query (MBQ). An MBQ is another type of

NSQ. MBQs are often triggered by events that repeat mul-

tiple times.

Aperiodic multitime burst query (AMBQ). An AMBQ rep-

resents MBQs triggered by unexpected events or user

requests that are issued aperiodically. Time series curves of

AMBQs share a common shape with multiple aperiodic

spikes (e.g., “Earthquake” is an AMBQ, as shown in

Figure 1c).

Periodic multitime burst query (PMBQ). A PMBQ denotes

MBQs that are issued periodically. These queries are usually

triggered by expected events that follow identical or almost

identical cycles during corresponding months of successive

years. Time series curves of PMBQs share a common shape

with multiple periodic spikes. For example, “the Olympics”

becomes popular in a 2-year cycle because the Olympic

Games are held every 2 years (including the Summer and

Winter Games), as shown in Figure 1d.

We analyzed in depth the contents of query instances for

different temporal pattern types. Some examples are sum-

marized in Table 1. We found that the queries of different

temporal pattern types indeed have different characteristics.

The majority of SQs could be characterized as broad,

general queries such as academic or daily stuff-related

queries (e.g., “artificial intelligence” and “pet therapy”). For

these queries, in the absence of an emergent focused intent

(e.g., an unexpected event), their curves remain relatively

flat. In comparison, NSQs are more focused. Many of these

queries refer to events. AMBQs in particular are usually

related to celebrities (e.g., Tom Hanks), disasters (e.g., bliz-

zard warning), and so on.

Automatic Temporal Pattern Detection Approach

Given a new query, we use a machine-learning approach

to detect its temporal pattern (i.e., SQ, OBQ, PMBQ, or

AMBQ, as shown in Figure 2). The framework is shown in

Figure 3. To achieve this goal, the primary task is to define

effective features that can capture the characteristics of dif-

ferent temporal pattern types.

When a user submits a query, we first build its time series

curve by mining query logs. If we cannot build a valid curve

due to sparse or missing data, we approximate the curve

from document data sets. Then, we preprocess the curve and

extract the features. Finally, a classifier is used to detect the

query’s temporal pattern (SVM is used in this article.)

We will discuss the main aspects of the framework

marked in bold in Figure 3.

Time Series Curve Generation

Given a query, we first need to generate its time series

(search volume over time). Let ft(q) denote the search

TABLE 1. Analysis of query instances for different temporal pattern types.

Temporal

pattern types Query instances

SQ Daily Stuff Seattle bus schedule, quality customer service, pet therapy, job salaries, business commerce daily, . . .

Academic Stuff artificial intelligence, associate degree, atomic physics, java jdk, biodiversity, graphics, NASA science, phx AZ,

protozoa, technology assessment, data flow diagram, . . .

. . . . . .

OBQ Political or

Social Events

capitol hill massacre, box fan recall, one million dollar bill, Olympics in Greece, free credit report Ohio, . . .

Disease Events bird flu in the United States, swine influenza, SARS, bird flu avian, . . .

. . . . . .

AMBQ Celebrities Buffalo Levine, Daniel Halpern, Michael Jordan, James Stewart, Tom Hanks, . . .

Disasters earthquake, tornado, tsunami, flooding, drought, blizzard, . . .

. . . . . .

PMBQ Season-Related

Activities,

Places, etc.

spring mill state park, summer job opportunities, summer youth program NY, snow in Italy, rice lake fishing,

Renton water park, ice machine, Erie PA beach, allergic reactions to bites, weather on February, . . .

Sport Competitions world cup, tennis courts, presidents cup in golf, . . .

Tax-Related Stuff taxes prep, taxes forms 1040, tax return direct deposit, social security wage tax, scheduled capital gains, sales

taxes deduction, Ohio state tax return, . . .

. . . . . .

Note. SQ = stable query; OBQ = one-time burst query; AMBQ = aperiodic multitime burst query; PMBQ = periodic multitime burst query.

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015 3

DOI: 10.1002/asiJOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017

DOI: 10.1002/asi

115

frequency of query q within the tth time interval (the search

volume within time interval t divided by total search

volumes of all time). t = 1 . . . N, where “month” is used

here. N is the number of time intervals (months). The time

series F(q) of a query q over N time intervals is a sequence

of ft(q), denoted as

F q f q f q f q f qt N( ) = ( ) ( ) ( ) ( ){ }1 2, , , , ,… … (1)

For most queries, we can determine F(q) by mining the

query logs. However, there are some queries, especially long

queries, for which we cannot determine F(q) due to the

insufficient search volume of the query logs. For example, if

we submit “a Canadian publishing house book” to Google

Trends,2 it returns “Not enough search volume to show

graphs.” Thus, for these queries, we investigate how to esti-

mate F(q) without the use of query logs.

We present two estimation approaches based on the fol-

lowing phenomenon. There are basically two roles on the

Internet: information publishers and information hunters.

Intuitively, they act consistently over time. In other words,

when an event occurs, information publishers publish the

related news articles, and information hunters simultane-

ously submit queries to search them. The number of relevant

articles reflects the popularity of corresponding queries, and

vice versa (Adar, Teevan, Dumais, & Elsas, 2009; Dakka,

Gravano, & Ipeirotis, 2012). We expect to construct the time

series F(q) from the documents corpus when their search

records in query logs are not sufficient to build the curves.

Two approaches, document-level approximation (DLA) and

word-level approximation (WLA), are proposed to solve this

problem from different levels.

DLA

Suppose that the change in a query’s search frequency

can be reflected by the corresponding number of relevant

documents. Therefore, for a given query q, we use its rel-

evant documents over time to approximate its time series

curve, as follows:

f q P t dP d q

P d qt

d Dd D

kk

( ) ≈ ( ) ( )( )∈∈ ∑∑ ˆ�

(2)

where t represents the time interval within which a docu-

ment is published. P(t|d) equals 1 if the publication time

(Chen, Ma, Cui, Rui, & Huang, 2010) of document d is

within t or 0 otherwise. In this article, Dk denotes the top

k = 100 Google search results of query q. The publication

time of a document is obtained by setting time constraints in

“Google Search Tools” when crawling Google search

results. P(d|q) is the relevance of document d to query q,

which is estimated with the Query Likelihood Model (Ponte

& Croft, 1998). If more relevant documents for query q are

published within the time interval t, Equation (2) will have a

larger value at t.

WLA

We also assume that a query’s search frequencies over

time can be reflected by the occurrence of its words. Thus,

for a given query q, we use its word-occurrence distribution

over time to approximate its time series curve:

f qP w t P w q

DF wt

w q

( )≈ ( ) ( )( )∈

∏ (3)

P(w|t) is the probability of generating the query word w

within the time interval t and is estimated as the term

frequency (TF) of w within t divided by the TF of all

words within t in the document’s corpus with Laplace

smoothing. DF(w) is the document frequency (DF) of

word w. From the perspective of language models for infor-

mation retrieval, P(w|q) is the query language model.

However, most user queries are short, so P(w|q) is usually

estimated with some interpolation approaches (Manning,

Raghavan, & Schütze, 2008). In the article, we use the TF of

word w divided by the TF of all words of query q in the top

k Google search results to estimate P(w|q). k is still set to

100.

Two instances, “Earthquake” and “Boston Marathon,”

are shown in Figure 4, in which the dash-dot line is the real

time series curve built from query logs. We can see that the

three curves fluctuate almost consistently over time.

Preprocessing With Time Series Analysis

Most original time series curves look different from the

typical instances in Figure 1. This is because a curve can be

decomposed into multiple components, together determin-

ing its shape (Brockwell & Davis, 2002; Kulkarni et al.,

2011). In this article, each point of F(q) is decomposed into

three components:2http://www.google.com/trends/explore

FIG. 3. Query temporal pattern detection framework.



DOI: 10.1002/asi

f q m s Yt t t t( ) = + + (4)

where mt is a slowly changing component known as the

trend component; st is a component with a fixed cycle,

referred to as the seasonal component; and Yt is the random,

burst, and irregular component.

The query popularity can exhibit an overall increasing or

decreasing trend over time. The trend component mt encodes

this property. However, mt has nothing to do with temporal

pattern types. st and Yt are what we care about here. There-

fore, we should first remove mt from F(q). Here, we use

polynomial fitting to model mt.

Specifically, let F(q) = {f1(q), f2(q), . . . ft(q), . . . , fN(q)}

be the curve we want to preprocess. Remember that

ft(q) = mt+st + Yt. We want to remove the trend component mt

from ft(q), so we model mt as

m W XtT

t= + ξ (5)

where W is the polynomial coefficients vector, and Xt = [1, t,

t2, . . . , tM] is the polynomial vector. The dimension of W and

Xt is M + 1. M also is the degree of the polynomial function.

ξ is random noise. We assume ξ obeys Student’s t distribu-

tion ξ∼St(ξ|0, λ, ν); that is,

P Stξ λ ν ξ λ ν, , ,( ) = ( )0 (6)

Because mt = WTXt + ξ, we have mt∼St(mt|WTXt, λ, ν);

that is,

P m W X St m W Xt t tT

t, , , , ,λ ν λ ν( ) = ( ) (7)

Ideally, to fit mt, we should use the {m1, m2, . . . , mt, . . . , mN}

of each curve as the training data; however, it is difficult to

determine the precise {m1, m2, . . . , mt, . . . , mN} of a curve.

We use F(q) = {f1(q), f2(q), . . . ft(q), . . . , fN(q)} instead. The

reason that we assume ξ∼St(ξ|0, λ, ν) instead of a Gaussian

distribution is that if we use (Xt, ft(q)) to approximate (Xt,

mt), the st and Yt components are noise. The Student’s t

distribution is more robust than is the Gaussian distribution

in this case (Bishop, 2006). For a given curve F(q) = {f1(q),

f2(q), . . . ft(q), . . . , fN(q)}, based on the maximum likelihood

estimation theory, we can write the regularized log likeli-

hood loss function as

L W v q

logP m m m m W X v W

P m W X

t N t

t t

, ,

, , , , , , , ,

log ,

λ

λ λ( )

= − ( ) +

= −

1 22

2… …

,, ,

log , ,

λ λ

λ λ

v W

St f q W X v W

t

N

tT

t

t

N

( ) +

≈ − ( )( ) +

=

=

∑

∑1

2

1

2

2

2

(8)

FIG. 4. Time series curve approximation instances.



DOI: 10.1002/asi

117

where log , ,St f q W X vtT

t

t

N

( )( )=∑ λ

1

is the log likelihood and

whereλ2

2W is the regularizer or penalty term, which is used

to avoid overfitting. The objective, L(W, λ, ν|q), can be

optimized with gradient descent to determine the param-

eters’ W.

After we determine W, mt is computed as mt = WTXt. Then,

for each point of a given curve, we remove mt from ft(q) and

finally obtain a processed version of the original curve

F f s Y t Nqtq

t t= = + ={ }1, ,… . Some instances are shown

in Figure 5. The solid line denotes the original curve F(q).

The dotted line denotes the trend component mt. The dashed

line denotes the remaining two components st + Yt.

Feature Extraction

Three groups of features are proposed for the machine-

learning model in this article: basic features, curve distance

features, and regression features. Features 1 to 4 are basic

features, Features 5 to 8 are curve distance features, and

Features 9 to 11 are regression features, as shown in Table 2.

We describe them in detail next.

Features 1–2. SQ is stable while NSQ is burst; therefore,

to distinguish them, the mean and standard deviation (SD) of

Fq are two obvious features.

Feature 3. It is difficult to separate OBQ and MBQ

because all of their SDs are usually large. Therefore, we

define a new feature, Maximum Spike (MS), as follows:

MSmax f t N

f

tq

tq

t

N=

={ }=∑

1

1

…(9)

MS represents the proportion of the maximum search volume.

Feature 4. Another feature, Gap between the First and

Second Maximum Spike (GFSMS), is defined as

GFSMSf f

f

FMq

SMq

tq

t

N=

−

=∑ 1

(10)

where f max f t NFMq

tq= ={ }1… is the maximum fre-

quency of the highest spike; that is, the maximum point of

the curve. f f t N f fSMq

tq

FM Cq

FMq= ={ } − {{ ± ±max , ,,1 1… …

f c mFMq = }}1… is the maximum frequency of the

second-highest spike, where the points f fFM Cq

FMq

± ±{ , , ,… 1

f c mFMq = }…1 denote the highest spike that is,, 2m points

around the maximum point. Therefore, the numerator of

Function 8 estimates the gap between the maximum points

of the first- and second-highest spikes. Here, m is a pre-

defined parameter, and 2m represents the duration of the

highest spike. m is determined by analyzing the highest

spikes of all NSQ queries in the data set.3 First, we define a

threshold r. Given a query q, we validate whether the fol-

lowing inequality is satisfied with increases in m for differ-

ent values of r.

min f f f c m r fFM Cq

FMq

FMq

FMq

± ± ={ } > ∗, , ,… …1 1 (11)

If the inequality is satisfied, we increase the count of satis-

fied queries for the corresponding m value. m values are

tested from 1 month to 6 months. The final result is shown in

Figure 6. The x-axis represents different values of 2m. The

y-axis represents the proportion of queries satisfying

Inequality 11. We can see that if 2m = 4, it can cover at least

74.6% of queries regardless of the different values of r.

Thus, without a loss of generality, we set m = 2 months in

this article.

Features 5–8. The next four features are based on the

curve distance. Given two preprocessed query curves Fq

1

and Fq

2′, the distance, Distance F F

q q1 2, ′( ), is defined as

follows:

Distance F F minF F

F

q qn

q

n

q

q1 2

1 2

1

, ,′ ( )

′

( ) =−

αα

(12)

where F n

q

2( )′ is the result of shifting time series F

q2

′ by n time

units, and • is the l2 norm. The distance measure is invariant

to scaling and translation of the time series. It finds the

optimal alignment (translation n) and the scaling coefficient

for matching the shapes of the two time series. With n fixed,

F F

F

q

n

q

q

1 2

1

− ( )′α

is a convex function of α, and therefore, we can

find the optimal value by setting the gradient to zero:

α =( ) ( )

′

( )′

F F

F

q T

n

q

n

q

1 2

22

. It is difficult to find the optimal n. In prac-

tice, we traverse all possible values of n to determine the

minimum distance (Yang & Leskovec, 2011).

For a given curve Fq, we use the average distance to the

queries in the same temporal pattern class as one feature.

Thus, we obtain four curve distance features, represented as

DSQ, DOBQ, DAMBQ, and DPMBQ, which are computed based on

the definition of the curve distance, Distance F Fq q

1 2, ′( ), as

follows:

D FDistance F F

MSQ

q

qiSQ

i

M

( ) =( )=∑ ,

1

1

1

D FDistance F F

MOBQ

q

qiOBQ

i

M

( ) =( )=∑ ,

1

2

2

(13)

D FDistance F F

MAMBQ

q

qiAMBQ

i

M

( ) =( )=∑ ,

1

3

3

D FDistance F F

MPMBQ

q

qiPMBQ

i

M

( ) =( )=∑ ,

1

4

4

3http://trec.nist.gov/data/million.query.html



DOI: 10.1002/asi

FIG

.5

.R

emov

ing

the

tren

dco

mp

on

ent.



DOI: 10.1002/asi

119

where FiSQ, Fi

OBQ, FiAMBQ, and Fi

PMBQ are some curves of the

four pattern types annotated in advance. In this article, we

use all curves in the training set.

Features 9–11. The ninth feature, cutoff, is a function

mapping:

cutoff F R Rq n( ) →: (14)

where Rn is the feature space of a given query curve Fq.

Cutoff is used to roughly identify spikes. Here, a spike is

defined as some continuous points whose values are larger

than the cutoff. We need to learn a cutoff function. The first

step is building the training data (Fq, cutoff). Given an Fq

(e.g., Figure 7), we can easily annotate its temporal pattern.

However, it is hard to annotate its cutoff value. Fortunately,

we find that its approximate pseudo-cutoff can be estimated

as follows:

pseudocutoff F

median value of area of Fig if q SQ

median va

q( )

==“ ” .1 7

llue of area of Fig if q OBQ

median value of area of Fig if q

“ ” .

“ ” .

2 7

3 7

== MMBQ

⎧⎨⎪

⎩⎪

(15)

We use the former eight features (Features 1–8) as the fea-

tures of support vector regression (SVR) (C.C. Chang &

Lin, 2011) to learn a nonlinear model as the cutoff function.

For SVR, we use Gaussian kernel function with the default

parameter settings in LIBSVM (C.C. Chang & Lin, 2011).

As stated earlier, the cutoff can be used to detect spikes of

Fq roughly, and we define the number of these detected

spikes as the 10th feature, Number of Spikes. If multiple

spikes exist, we use yi to represent the time interval between

the middle points of two neighboring spikes. Then, we obtain

a sequence y1, y2, . . . , yw. The 11th feature, Period of Spikes

(PS), is computed as the SD of the sequence. Otherwise, if

there is no or one spike, we set PS with extreme values.

Experiments

In this session, we carry out experiments to demonstrate

the performance of our time series curve approximation

approaches and temporal pattern detection approach.

Experimental Setup

Data Sets. We first randomly extract approximately 15,000

queries from the Web Track of TREC3 and submit each

query to Google Trends2 to download its search volume file.

The numbers in the file reflect the search volume for the

particular query, relative to the total search volume con-

ducted using Google over time. We have to use this file as

the corresponding query’s frequency data to demonstrate

our temporal pattern detection algorithm because it is very

difficult to obtain real, large-scale, and long-time query

logs from commercial search engines. Fortunately, these

data are suitable for both our approaches and the baselines.

For each query, we also collect web pages from January

2004 to July 2013. Specifically, for each month, we issue

each query to Google Search with the condition of a time

range and collect the top-100 results in the ranking list. For

TABLE 2. Summary of features.

Groups Symbols Descriptions

Basic

Features

M Basic Features capture the intuitive

characteristics of different time series.SD

MS

GFSMS

Curve

Distance

Features

DSQ Curve Distance Features capture the

differences of different pattern types

based on the Curve Distance (i.e.,

Equation [12]).

DOBQ

DAMBQ

DPMBQ

Regression

Features

Cutoff Regression Features roughly identify

spikes of the time series with a

regression approach.

No. of Spikes

Period of Spikes

FIG. 6. Analysis of average spike duration.

FIG. 7. Approximate cutoff of training data.



DOI: 10.1002/asi

example, “Earthquake” is submitted with the time condition

“1 Jan, 2004–31 Jan, 2004,” which can be specified in

“Google Search Tools.” We use the collected data set to

estimate time series curves of these queries with the

approaches DLA and WLA, respectively. The data set is

available online.4

Annotations. We employed four assessors to manually

annotate the patterns of these queries in terms of the tax-

onomy defined in Figure 2. Three assessors were under-

graduate students, and the fourth was a graduate student. All

assessors were trained before they began to label. For each

query and its corresponding time series curve, three asses-

sors first annotated its pattern type. If two or three of their

annotations were consistent, we annotated the query with

the majority type. Otherwise, the fourth assessor made the

final decision. All assessors used the same user interface to

annotate these queries, as shown in Figure 8. For each query,

the assessors either annotated it as one pattern type or anno-

tated it as “hard to identify” if they could not make the

decision. The average kappa statistic (Viera, Garrett, &

Joanne, 2005) value is 0.81, κ(User1, User2) = 0.84,

κ(User1, User3) = 0.76, κ(User2, User3) = 0.84. On one

hand, the κ value is larger than 0.8, which means that the

annotations of the three assessors have good consistency. On

the other hand, it is only slightly larger than 0.8, which

indicates that the task is difficult. According to our manual

annotations, the percentages of different pattern types are

summarized in Table 3. AMBQ queries account for the

largest percentage, and PMBQ queries account for a small

percentage of all queries.

Baselines. No existing studies have proposed approaches

to comprehensively detect temporal patterns. The only

known work closely related to our study was conducted by

Shokouhi (2011). He used time-series decomposition tech-

niques to identify seasonal queries (i.e., PMBQ) (Shokouhi,

2011). For a given query, he first converted its historical

frequency into time series with monthly splits. He then

decomposed the time series by applying Holt–Winters

additive smoothing (Cleveland, Cleveland, McRae, &

Terpenning, 1990). If the decomposed seasonal component

and the raw data have similar distributions, the query is

classified as seasonal. We refer to this baseline as detecting

seasonal queries with time series-analysis (DSQTSA). The

other two baselines use 1-Nearest Neighbor classification

with the curve distance (i.e., Function 10) and Euclidean

distance as distance metrics, which are denoted as 1NNCD

and 1NNED, respectively. All queries used in our experiments

have time series obtained from Google Trends.

OurApproachREAL used the time series from Google Trends.

For OurApproachDLA and OurApproachWLA, we assume that

the corresponding time series data sets are not available. We

used DLA and WLA as the approximation approaches.

Evaluation measures. We use Precision (P), Recall (R),

and F1 to evaluate the temporal pattern detection results. If

the query category classified by the algorithm agrees with

the manually annotated category, then it is a correct classi-

fication. Precision is the fraction of classified query catego-

ries that are correct. Recall is the fraction of correct query

categories that are classified. The F1 score is calculated

using the following function: F1 = 2 * (P * R)/(P + R).

Parameter settings. The C-Support Vector Classification

in LIBSVM (C.C. Chang & Lin, 2011) with the Gaussian

kernel function is used in this article. There are two hyper-

parameters γ and C. We utilized the standard grid search

approach to find the best parameter values. The tested values

of C and γ vary from 2−6 to 26. Table 4 reports the parameter

tuning results for SQ type. Similar results are achieved for

the other three types. Each value in the tables is an average4http://ir.sdu.edu.cn/exp/dtp.htm

FIG. 8. User interface for assessors.

TABLE 3. Query percentages for different pattern types.

QwT SQ OBQ AMBQ PMBQ

Percentages 12% 18% 19% 45% 6%



DOI: 10.1002/asi

121

over fivefold cross-validation. The worst results and the best

results are marked with wavy underlines and straight under-

lines, respectively. The F1 ranges are 0.877 to 0.906 for SQ,

0.862 to 0.910 for OBQ, 0.917 to 0.918 for AMBQ, and

0.812 to 0.905 for PMBQ. Because the parameter settings of

the best results for four pattern types are different, we report

experimental results with default parameter settings (i.e.,

γ =1

11, C = 1) to compare with the baseline approaches.

Performance Comparison

Results and discussion. The evaluation results of our

approaches and the baselines are summarized in Table 5 and

Figure 9. Obviously, OurApproachREAL achieves the highest

performance and significantly outperforms the baselines for

all four classes. We observe that the Precision of SQ is the

worst compared with that of the other three patterns. The

possible reason is that some NSQ curves usually have small

spikes. As a result, these queries might be mistakenly clas-

sified as SQ. In addition, the Recall of PMBQ is the worst

compared with the other three patterns because if the spike

fluctuation of PMBQ is not large enough, it is difficult to

detect. As a result, the query might be mistakenly classified

as SQ, AMBQ, or OBQ. On the contrary, the Precision of

OBQ is the best among these four classes because OBQ

queries usually have larger SDs, MSs, and smaller Ms. This

characteristic is captured by our approach to effectively

identify OBQ. Compared with the baselines, especially

DSQTSA, our approach achieves higher performance for

PMBQ because both 1NN and DSQTSA use a simple, single

approach to compute the differences between curves. In

contrast, our approach integrates multiple features by deep

analysis into the characteristics of different time series

curves. In summary, the results indicate that our approach

can help to effectively identify temporal patterns of queries.

For the approximation of the time series curves, both

DLA and WLA are helpful to some extent. This result is

reasonable because the dynamic behaviors of web informa-

tion and user queries are consistent over time. That is, the

changing of documents can reflect the popularity of the

corresponding queries, and vice versa. We can see that

OurApproachWLA is slightly better than OurApproachDLA.

The possible reason for this behavior is that DLA may intro-

duce more noise because it considers whole documents

instead of specific query terms to generate the curves.

However, the temporal pattern detection approach with esti-

mated curves is still not effective enough. Further work is

TABLE 4. Grid search of parameters γ and C of SVM for SQ type with F1 measure.

C

γ 2−6 2−5 2−4 2−3 2−2 2−1 20 21 22 23 24 25 26

2−6 0.885 0.886 0.880 0.883 0.881 0.878 0.879 0.880 0.884 0.887 0.891 0.894 0.896

2−5 0.886 0.879 0.882 0.880 0.878 0.879 0.881 0.883 0.889 0.891 0.894 0.897 0.897

2−4 0.880 0.883 0.881 0.879 0.879 0.881 0.884 0.888 0.892 0.896 0.897 0.899 0.900

2−3 0.882 0.881 0.879 0.880 0.881 0.884 0.888 0.894 0.897 0.898 0.899 0.900 0.902

2−2 0.882 0.879 0.880 0.881 0.885 0.891 0.895 0.896 0.899 0.899 0.902 0.904 0.905

2−1 0.880 0.880 0.883 0.887 0.892 0.896 0.898 0.899 0.900 0.901 0.904 0.904 0.905

20 0.881 0.883 0.886 0.891 0.895 0.897 0.898 0.901 0.902 0.903 0.904 0.906 0.905

21 0.883 0.884 0.891 0.894 0.897 0.898 0.900 0.902 0.904 0.904 0.905 0.904 0.904

22 0.886 0.885 0.886 0.886 0.880 0.883 0.880 0.878 0.878 0.880 0.882 0.886 0.888

23 0.886 0.885 0.885 0.880 0.883 0.881 0.879 0.878 0.881 0.883 0.886 0.889 0.893

24 0.886 0.885 0.880 0.884 0.881 0.877 0.880 0.881 0.883 0.887 0.891 0.893 0.896

25 0.885 0.880 0.883 0.881 0.879 0.879 0.881 0.883 0.889 0.891 0.894 0.898 0.899

26 0.879 0.883 0.882 0.879 0.879 0.881 0.883 0.888 0.892 0.896 0.898 0.899 0.900

TABLE 5. Performance comparison.

Models

SQ OBQ AMBQ PMBQ

P R F1 P R F1 P R F1 P R F1

OurApproachREAL 0.850a 0.921 0.884a 0.926a 0.857 0.890a 0.918a 0.916a 0.917a 0.909ab 0.815b 0.859ab

OurApproachDLA 0.515 0.584 0.547 0.511 0.502 0.506 0.442 0.581 0.502 0.467 0.498 0.482

OurApproachWLA 0.526 0.567 0.546 0.522 0.531 0.526 0.451 0.430 0.440 0.481 0.515 0.497

DSQTSA x x x x x x x x x 0.607 0.693 0.647

1NNCD 0.793 0.547 0.647 0.563 0.882 0.687 0.849 0.728 0.784 0.507 0.812 0.624

1NNED 0.612 0.943 0.742 0.446 0.764 0.563 0.879 0.552 0.678 0.623 0.707 0.662

Note. a,bindicate that our approaches make statistically significant differences (i.e., p < .05 with two-tailed t-test) to the baseline methods, 1NN and

DSQTSA, respectively. The results of OurApproach are achieved with the Gaussian kernel function and default parameter settings in LIBSVM (Chang & Lin,

2011). Fivefold cross-validation is adopted for all models.



DOI: 10.1002/asi

still necessary to design more effective approaches to

approximate time series curves.

Feature effectiveness analysis. We analyze the effective-

ness of each feature on the overall performance of Our-

ApproachREAL. The results are summarized in Table 6.

Generally, discarding any feature leads to a decrease in

performance. Moreover, some features lead to a larger

decline than do the others, such as, DOBQ, DAMBQ. The results

only reflect the effects of different features to some extent

because the effects of a single feature might overlap with the

combination of multiple features.

We further analyze some typical features’ effects by plot-

ting the distributions of the query instances in the feature

space, as shown in Figure 10. It is obvious that the features

Mean and Standard Deviation can effectively distinguish SQ

from NSQ, as illustrated in Figure 10a. SQs and especially

SDs are generally small. This behavior is reasonable because

the curves of SQ queries are more flat, which means small

SDs. Moreover, after preprocessing (removing trend compo-

nents), the Ms also are generally small. From Figure 10b–g,

we observe that combinations of the features MS, GFSMS,

Standard Deviation, DOBQ, DAMBQ, DPMBQ, and DSQ can clas-

sify OBQ and MBQ well. As described in the figures, the

MSs and GFSMSs of OBQ tend to be larger than the other

classes. The explanation is that the only spike in the OBQ

curves accounts for a large proportion of the search volumes,

which leads to large MS and GFSMS values. Moreover, the

Curve Distance between OBQ queries is smaller because

OBQ curves are easier to match with each other than with

the multiple spikes of MBQ queries. As for AMBQ and

PMBQ queries, DOBQ, DAMBQ, and DPMBQ already can effec-

tively classify them, as illustrated in Figure 10f and 10h.

Some other features, such as cutoff and PS, also help

enhance the performance. In summary, all features are

useful and effective to distinguish the queries from different

dimensions.

Classification scheme validation. As for our query classi-

fication scheme, we further used an unsupervised method

(i.e., clustering) to verify its correctness and discreteness.

Specifically, we used Lloyd’s (1982) k-means method to

cluster these queries, with the number of clusters ranging

from 1 to 6. We used two families of metrics, pair counting

measures and set-matching based measures, to evaluate the

clustering results. These metrics are widely used to evaluate

the performance of clustering algorithms (Amigó, Gonzalo,

Artiles, & Verdejo, 2009). The results are shown in Table 7.

From the results, we can see that the best number of

clusters is four. Moreover, high performance is achieved,

F1a = 0.886 and F1b = 0.871, which indicates that the clus-

tering results are highly consistent with manual annotations.

This result is strong evidence for the correctness and dis-

creteness of our classification scheme.

Application Discussion

In this section, we discuss the potential applications of

our study. Note that for OBQ-type queries, our method

cannot detect them until the search volumes increase.

However, our method can still help improve the search

results during the middle and later stages of the correspond-

ing events. Figure 11 presents two examples. The two

queries correspond to two events, which were just

a

b

FIG. 9. Overall model performance comparison.

TABLE 6. Feature effectiveness analysis.

Removed features

Average

precision Average F1

M 0.108↓ 0.141↓SD 0.106↓ 0.135↓MS 0.102↓ 0.130↓GFSMS 0.107↓ 0.135↓DSQ 0.107↓ 0.131↓DOBQ 0.111↓ 0.158↓DAMBQ 0.129↓ 0.178↓DPMBQ 0.114↓ 0.125↓Cutoff 0.107↓ 0.092↓Number of Spikes 0.106↓ 0.090↓Period of Spikes 0.107↓ 0.112↓

Note. The values in this table indicate an absolute drop compared with

the performance with all features.



DOI: 10.1002/asi

123

FIG. 10. Feature effectiveness analysis. Note that a single scatter plot cannot reflect the real situation in high dimensional feature space.



DOI: 10.1002/asi

happening. Our approach can accurately identify their

pattern types as OBQ. Then, the search results can be

improved for the subsequent searches.

We suggest that the queries of different temporal pattern

types should be addressed by a search engine in different

ways:

• For SQ-type queries, they denote users’ common, frequent,

information needs. That is, users’ search intents usually do

not change over time (Kulkarni et al., 2011). Therefore, rel-

evance is the most significant measure for the results ranking.

The most relevant pages, regardless of whether they were

published previously or recently, should be ranked at the

top of the results list (Adar et al., 2009; Elsas & Dumais,

2010). Moreover, web pages from authoritative sites such as

Wikipedia are more valuable and should especially be

considered.

• For OBQ-type queries, regardless of whether corresponding

events are happening or have happened, users’ intents mostly

focus on documents about these events (Joho, Jatowt, & Roi,

2013; McCreadie, Macdonald, & Ounis, 2013). In this case,

search engines should rank the relevant documents that were

published during the period of the events at the top of the

results list. Moreover, freshness is very important, so search

engines should more actively crawl new web pages and

update the contents on old web pages (Dong et al., 2010;

Olston & Pandey, 2008).

• For AMBQ-type queries, they have multiple spikes at different

time points that may correspond to different events. There-

fore, search engines should pay attention to the changes in the

query intents by analyzing changes in the search results and

user-behavior data (Kulkarni et al., 2011). Moreover, user

search intents behind AMBQ queries may be ambiguous. That

is, different users may issue the same query to find informa-

tion corresponding to different events. Search engines should

diversify their search results to make sure that no matter what

the intent is, there is at least one satisfactory result (Berberich

& Bedathur, 2013).

FIG. 10. Continued.

TABLE 7. Experimental results of Lloyd’s k-means clustering with

cosine similarity. F1a = 2*Precision*Recall/(Precision+Recall) F1b =2*Purity*Inverse Purity/(Purity+Inverse Purity)

Cluster no.

Pair counting measures Set-matching based measures

Precision Recall F1a Purity Inverse Purity F1b

K = 1 0.400 1.000 0.572 1.000 0.577 0.732

K = 2 0.736 0.945 0.828 0.967 0.736 0.836

K = 3 0.639 0.549 0.590 0.696 0.736 0.716

K = 4 0.875 0.897 0.886 0.873 0.869 0.871

K = 5 0.792 0.475 0.594 0.560 0.821 0.666

K = 6 0.765 0.386 0.513 0.535 0.821 0.648

FIG. 11. Query temporal pattern examples from Google Trends for two ad

hoc queries.



DOI: 10.1002/asi

125

• For PMBQ-type queries, they represent events that follow

identical cycles. Therefore, search engines can predict future

events to respond to the users with temporally relevant search

results (Shokouhi, 2011). Although the spikes are usually

associated with regularly occurring events, the user search

intents behind PMBQ queries also may be temporally

ambiguous. In this case, search engines can temporally diver-

sify search results (Berberich & Bedathur, 2013). Moreover,

Alfonseca, Ciaramita, and Hall (2009) showed that the query

periodicities also could be used to improve the performance of

query suggestions.

Related Work

There is a large amount of previous work exploring the

characteristics of web queries, among which query classifi-

cation is an important part.

Broder’s classification scheme. In 2002, Broder presented

a trichotomy of web queries: navigational queries, informa-

tional queries and transactional queries. Navigational

queries are intended to find a specific website that the user

has in mind. Informational queries are intended to find infor-

mation about a topic. Transactional queries are intended to

perform web-mediated activities. The taxonomy has been

adopted by many studies (Herrera, de Moura, Cristo, Silva,

& da Silva, 2010; Jansen & Booth, 2010; Jansen, Booth, &

Spink, 2007, 2008; U. Lee, Liu, & Cho, 2005; Rose &

Levinson, 2004). Lewandowski, Drechsler, and Mach

(2012) used Broder’s classification scheme to measure the

reliability of query intent assessments to find out whether

manual intent annotations were sufficiently reliable to be

used as test data for automatic approaches. Rose and

Levinson (2004) described a framework for understanding

the underlying goals of user queries, and their experience in

using the framework to manually classify queries from a

web search engine. Their analysis suggested that naviga-

tional queries were less prevalent than generally believed.

They also considered that the entries within the transactional

class could be classified into more subclasses. Therefore,

they replaced the transactional class with the resource class

and subdivided the informational class and the resource

class into more detailed classes (Rose & Levinson, 2004).

Jansen et al. presented a methodology to classify user

queries in terms of a set of characteristics for each category

in Broder’s taxonomy. They implemented a classification

algorithm and automatically classified 1 million queries

from a web search engine log submitted by several hundred

thousand users (Jansen et al., 2007, 2008; Jansen & Booth,

2010). U. Lee et al. (2005) proposed an approach for query

classification by considering the click distribution, the

average click frequency of each query, and the anchor text

distribution. They only considered two classes: navigational

queries and informational queries (U. Lee, Liu, & Cho,

2005). Herrera et al. (2010) presented a study on the impact

of using several features extracted from the document col-

lections and query logs for the query classification task.

Kang and Kim (2003) explored the occurrence patterns of

query terms in web pages to detect the goal of a query as

either navigational or informational.

Time series patterns based classification schemes. Instead

of classifying queries as informational, navigational, and

transactional, the following studies developed classification

schemes based on time series patterns. Chien and Immorlica

(2005) utilized temporal correlation to identify sets of

similar queries. They suggested that queries with similar

frequency patterns were likely to be related (Chien &

Immorlica, 2005). Radinsky et al. (2012) explored how to

use time series techniques to model and predict user behav-

ior based on queries over time, including trends, periodici-

ties, and surprises. Shokouhi (2011) investigated seasonal

queries that represent seasonal events that repeat every year

(i.e., PMBQ queries in this article). He focused on detecting

seasonal queries using time series analysis technologies

(Shokouhi, 2011). Hong, Lin, and Wang (2002) used data-

mining techniques to mine browsing patterns from query

logs to make rules for web page retrieval. Tseng, Lin, and

Chang (2008) proposed a novel data-mining algorithm

named Temporal N-Gram (TN-Gram) to discover and

predict user navigation patterns by mining the query

logs). Kulkarni et al. (2011) studied the characteristics of

user queries from four aspects by manually analyzing the

query logs: the number of spikes, the shapes of the

spikes, the periodicity of the queries, and the overall trends

in popularity.

Other classification schemes. The following studies

explore their own classification schemes. Verberne et al.

(2013) carried out similar studies on measuring the reliabil-

ity of query intent assessments similar to those of Lewan-

dowski, Drechsler, and Mach (2012). The difference is that

they use a newly defined query classification scheme. Their

classification scheme consists of the following dimensions:

Topic, Action type, Modus, Source authority sensitivity,

Spatial sensitivity, Time sensitivity, and Specificity

(Verberne et al., 2013). Roy, Katare, Ganguly, Laxman, and

Choudhury (2015) took a deeper look at queries, focusing on

individual words as possible indicators of user intent. Their

query term taxonomy was derived through rigorous manual

analysis of large query logs (Roy et al., 2015). Jones and

Diaz (2007) noted that temporal properties of queries can be

used to diagnose the quality of the retrieval. They presented

three temporal classes of queries: atemporal queries, tempo-

rally unambiguous queries, and temporally ambiguous

queries (Jones & Diaz, 2007). Metzler et al. (2009) investi-

gated implicitly year-qualified queries that do not actually

contain a year, but the user might have implicitly formulated

the query with a specific year in mind. Asur and Buehrer

(2009) studied temporal signatures of three different types

of queries, Navigational, Adult, and News queries, and pro-

posed a method to classify a query into these three types by

considering the trends in the queries’ time series. Dakka,

Gravano, and Ipeirotis (2008) proposed a framework for



DOI: 10.1002/asi

handling time-sensitive queries and automatically identify-

ing the important time intervals that were likely to be of

interest for a query. Bhatia, Brunk, and Mitra (2012) pre-

sented an analysis of the query logs from a commercial web

search engine and studied the web search queries for their

diversification requirements. They analyzed queries based

on click entropy and popularity, and proposed a query tax-

onomy based on their diversification requirements. They

then automatically classified web search queries into one of

the classes of their proposed taxonomy (Bhatia et al., 2012).

Y. Chang, He, Yu, and Lu (2006) analyzed user goals from

the viewpoint of natural language processing. They assumed

that the subject of the hidden sentence in the user’s mind

was the user himself and that the combined pair of the verb

and object was called “VOpair.” Given a query, they

attempted to identify the user’s goal (VOpair) from the web

search results. Y.J. Lee, Lee, Chai, Hwang, and Ryu (2009)

proposed a new temporal data-mining technique that could

extract temporal interval relation rules from temporal inter-

val data using Allen’s theory (Y.J. Lee et al., 2009). They

also investigated a relatively unexamined query type:

queries composed of URLs. The extents, variations, and user

click-through behaviors were examined to determine the

intents behind URL queries (Lee & Sanderson, 2010).

Nunes, Ribeiro, and David (2008) investigated the use of

temporal expressions in web queries. They found that tem-

poral expressions were scarcely used in the queries, except

for some specific topics such as Autos, Sports, News, and

Holidays (Nunes et al., 2008).

Although many studies have investigated the characteris-

tics of user queries, to our knowledge, none of them studied

the problem from the perspective employed in this article.

Generally, the differences between our study and existing

studies are threefold: First, we use a different classification

scheme whereas most existing studies have focused on

Broder’s classification scheme or its variation. Second, the

most relevant study was performed by Kulkarni et al.

(2011). However, they did not propose an automatic detec-

tion approach. Third, although Shokouhi (2011) proposed an

automatic detection approach, he only focused on seasonal

queries (i.e., PMBQ) using time series analysis technologies.

Conclusion and Future Work

In this article, we study the problem of how to detect the

temporal patterns of user queries. We propose a query clas-

sification method to solve this problem. Moreover, for the

queries in which we cannot determine their frequency

curves due to a lack of data recorded in query logs, we

propose two approaches to approximate their time series

curves from the document data sets. Our work can help to

better understand query intents and to further improve the

performance of search engines.

In future work, we will explore more features for

temporal-pattern-based query classification. In particular,

we will explore features contained in web pages so that we

can detect the pattern types as soon as possible. For

example, we hope to predict OBQ-type queries tbefore the

search volume increases sharply (i.e., in the early stages of

events). We also plan to study how different temporal pat-

terns can be used to construct a retrieval model to improve

the performance of information retrieval.

Acknowledgments

This work is supported by the Natural Science Founda-

tion of China (61272240, 61103151), the Doctoral Fund of

the Ministry of Education of China (20110131110028), the

Natural Science foundation of Shandong province

(ZR2012FM037), the Excellent Middle-Aged and Youth

Scientists of Shandong Province (BS2012DX017), and the

Fundamental Research Funds of Shandong University.

References

Adar, E., Teevan, J., Dumais, S.T., & Elsas, J.L. (2009). The web changes

everything: Understanding the dynamics of web content. In

R.A. Baeza Yates, P. Boldi, B.A. Ribeiro Neto, & B.B. Cambazoglu

(Eds.), Proceedings of the Second ACM International Conference on

Web Search and Data Mining (WSDM′09) (pp. 282–291). New York,

NY: ACM.

Alfonseca, E., Ciaramita, M., & Hall, K. (2009). Gazpacho and summer

rash: Lexical relationships from temporal patterns of Web search queries.

In P. Koehn & R. Mihalcea (Eds.), Proceedings of the 2009 Conference

on Empirical Methods on Natural Language Processing (EMNLP′09)

(pp. 1046–1055). Stroudsburg, PA,: ACL.

Amigó, E., Gonzalo, J., Artiles, J., & Verdejo, F. (2009). A comparison of

extrinsic clustering evaluation metrics based on formal constraints. Infor-

mation Retrieval, 12, 461–486.

Asur, S., & Buehrer, G. (2009). Temporal analysis of web search query-

click data. In C.L. Giles (Ed.), Proceedings of the Third Workshop on

Social Network Analysis Workshop (SNA-KDD′09) (pp. 1–8). New

York, NY: ACM.

Berberich, K., & Bedathur, S. (2013). Temporal diversification of search

results. SIGIR 2013 Workshop on Time-aware Information Access,

Dublin, Ireland.

Bhatia, S., Brunk, C., & Mitra, P. (2012). Analysis and automatic classifi-

cation of web search queries for diversification requirements. Journal of

the American Society for Information Science and Technology, 49, 1–10.

Bishop, C.M. (2006). Pattern recognition and machine learning. New York,

NY: Springer.

Brockwell, P.J., & Davis, R.A. (2002). Introduction to time series and

forecasting. New York, NY: Springer.

Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36, 3–10.

Chang, C.C., & Lin, C.J. (2011). Libsvm: A library for support vector

machines. ACM Transactions on Intelligent Systems and Technology, 2,

1–27.

Chang, Y., He, K., Yu, S., & Lu, W. (2006). Identifying user goals from web

search results. In T. Nishida, Z. Shi, U. Visser, X. Wu, J. Liu, B. Wah, . . .

Y. Cheung (Eds.), Proceedings of the 2006 IEEE/WIC/ACM Interna-

tional Conference on Web Intelligence (WI′06) (pp. 1038–1041). Wash-

ington, DC: IEEE Computer Society.

Chen, Z., Ma, J., Cui, C., Rui, H., & Huang, S. (2010). Web page publica-

tion time detection and its application for page rank. In F. Crestani &

S. Marchand Maillet (Eds.), Proceedings of the 33rd International ACM

SIGIR Conference on Research and Development in Information

Retrieval (SIGIR′10) (pp. 859–860). New York, NY: ACM.

Chien, S., & Immorlica, N. (2005). Semantic similarity between search

engine queries using temporal correlation. In A. Ellis & T. Hagino (Eds.),

Proceedings of the 14th International Conference on World Wide Web

(WWW′05) (pp. 2–11). New York, NY: ACM.



DOI: 10.1002/asi

127

Cleveland, R.B., Cleveland, W.S., McRae, J.E., & Terpenning, I. (1990).

STL: A seasonal-trend decomposition procedure based on loess. Journal

of Official Statistics, 6, 3–73.

Dakka, W., Gravano, L., & Ipeirotis, P.G. (2008). Answering general time

sensitive queries. In J.G. Shanahan (Ed.), Proceedings of the 17th ACM

Conference on Information and Knowledge Management (CIKM′08)

(pp. 1437–1438). New York, NY: ACM.

Dakka, W., Gravano, L., & Ipeirotis, P.G. (2012). Answering general time-

sensitive queries. IEEE Transactions on Knowledge and Data Engineer-

ing, 24, 220–235.

Dong, A., Chang, Y., Zheng, Z., Mishne, G., Bai, J., Zhang, R., et al.

(2010). Towards recency ranking in web search. In B.D. Davison &

T. Suel (Eds.), Proceedings of the Third ACM International Conference

on Web Search and Data Mining (WSDM′10) (pp. 11–20). New York,

NY: ACM.

Elsas, J.L., & Dumais, S.T. (2010). Leveraging temporal dynamics of

document content in relevance ranking. In B.D. Davison & T. Suel

(Eds.), Proceedings of the Third ACM International Conference

on Web Search and Data Mining (WSDM′10) (pp. 1–10). NewYork, NY:

ACM.

Herrera, M.R., de Moura, E.S., Cristo, M., Silva, T.P., & da Silva, A.S.

(2010). Exploring features for the automatic identification of user goals

in web search. Information Processing & Management, 46, 131–142.

Hong, T.P., Lin, K.Y., & Wang, S.L. (2002). Mining linguistic browsing

patterns in the world wide web. Soft Computing, 6, 329–336.

Jansen, B.J., & Booth, D. (2010). Classifying web queries by topic and user

intent. In E. Mynatt & D. Schoner (Eds.), CHI ′10 Extended Abstracts on

Human Factors in Computing Systems (CHI EA′10) (pp. 4285–4290).

New York, NY: ACM.

Jansen, B.J., Booth, D.L., & Spink, A. (2007). Determining the user intent

of web search engine queries. In C. Williamson & M.E. Zurko (Eds.),

Proceedings of the 16th International Conference on World Wide Web

(WWW′07) (pp. 1149–1150). New York, NY: ACM.

Jansen, B.J., Booth, D.L., & Spink, A. (2008). Determining the informa-

tional, navigational, and transactional intent of web queries. Information

Processing & Management, 44, 1251–1266.

Joho, H., Jatowt, A., & Roi, B. (2013). A survey of temporal web search

experience. In D. Schwabe, V. Almeida, & H. Glaser (Eds.), Proceedings

of the 22nd International Conference on World Wide Web Companion,

(WWW′13 Companion) (pp. 1101–1108). Canton of Geneva, Switzer-

land: WWW Conferences Steering Committee.

Jones, R., & Diaz, F. (2007). Temporal profiles of queries. ACM Transac-

tions on Information Systems, 25, 1–13.

Kang, I.H., & Kim, G. (2003). Query type classification for web document

retrieval. In C. Clarke & G. Cormack (Eds.), Proceedings of the 26th

annual International ACM SIGIR Conference on Research and Develop-

ment in Information Retrieval (SIGIR′03) (pp. 64–71). New York, NY:

ACM.

Kulkarni, A., Teevan, J., Svore, K.M., & Dumais, S.T. (2011). Understand-

ing temporal query dynamics. In I. King (Ed.), Proceedings of the Fourth

ACM International Conference on Web Search and Data Mining

(WSDM′11) (pp. 167–176). New York, NY: ACM.

Lee, U., Liu, Z., & Cho, J. (2005). Automatic identification of user goals in

web search. In A. Ellis & T. Hagino (Eds.), Proceedings of the 14th

International Conference on World Wide Web (WWW′05) (pp. 391–

400). New York, NY: ACM.

Lee, W.M., & Sanderson, M. (2010). Analyzing URL queries. Journal of

the American Society for Information Science and Technology, 61,

2300–2310.

Lee, Y.J., Lee, J.W., Chai, D.J., Hwang, B.H., & Ryu, K.H. (2009). Mining

temporal interval relational rules from temporal data. Journal of Systems

and Software, 82, 155–167.

Lewandowski, D., Drechsler, J., & Mach, S. (2012). Deriving query intents

from web search engine queries. Journal of the American Society for

Information Science and Technology, 63, 1773–1788.

Lloyd, S. (1982). Least squares quantization in pcm. IEEE Transactions on

Information Theory, 28, 129–137.

Manning, C.D., Raghavan, P., & Schütze, H. (2008). Introduction to Infor-

mation Retrieval. New York, NY: Cambridge University Press.

McCreadie, R., Macdonald, C., & Ounis, I. (2013). News vertical search:

When and what to display to users. In G.J.F. Jones & P. Sheridan (Eds.),

Proceedings of the 36th International ACM SIGIR Conference on

Research and Development in Information Retrieval (SIGIR′13)


Metzler, D., Jones, R., Peng, F., & Zhang, R. (2009). Improving search

relevance for implicitly temporal queries. In J. Allan & J. Aslam (Eds.),

Proceedings of the 32nd International ACM SIGIR Conference on

Research and Development in Information Retrieval (SIGIR′09)


Nunes, S., Ribeiro, C., & David, G. (2008). Use of temporal expressions in

web search. In C. Macdonald, I. Ounis, V. Plachouras, I. Ruthven, &

R.W. White (Eds.), Proceedings of the IR Research, 30th European

Conference on Advances in Information Retrieval (ECIR′08) (pp. 580–

584). New York, NY: Springer.

Olston, C., & Pandey, S. (2008). Recrawl scheduling based on information

longevity. In J. Huai, R. Chen, H. Hon, & Y. Liu (Eds.), Proceedings of

the 17th International Conference on World Wide Web (WWW′08)


Ponte, J.M., & Croft, W.B. (1998). A language modeling approach to

information retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R.

Wilkinson, & J. Zobel (Eds.), Proceedings of the 21st annual interna-

tional ACM SIGIR conference on Research and development in infor-

mation retrieval (SIGIR′98) (pp. 275–281). New York, NY: ACM.

Radinsky, K., Svore, K., Dumais, S., Teevan, J., Bocharov, A., & Horvitz,

E. (2012). Modeling and predicting behavioral dynamics on the web. In

A. Mille, F. Gandon, & J. Misselis (Eds.), Proceedings of the 21st

International Conference on World Wide Web (WWW′12) (pp. 599–

608). New York, NY: ACM.

Rose, D.E., & Levinson, D. (2004). Understanding user goals in web

search. In S. Feldman & M. Uretsky (Eds.), Proceedings of the 13th

International Conference on World Wide Web (WWW′04) (pp. 13–19).

New York, NY: ACM.

Roy, R.S., Katare, R., Ganguly, N., Laxman, S., & Choudhury, M. (2015).

Discovering and understanding word level user intent in web search

queries. Web Semantics: Science, Services and Agents on the World

Wide Web, 30, 22–38.

Shokouhi, M. (2011). Detecting seasonal queries by time-series analysis. In

W. Ma & J. Nie (Eds.), Proceedings of the 34th International ACM

SIGIR Conference on Research and Development in Information

Retrieval (SIGIR′11) (pp. 1171–1172). New York, NY: ACM.

Tseng, V.S., Lin, K.W., & Chang, J.C. (2008). Prediction of user navigation

patterns by mining the temporal web usage evolution. Soft Computing,

12, 157–163.

Verberne, S., van der Heijden, M., Hinne, M., Sappelli, M., Koldijk, S.,

Hoenkamp, E., Kraaij, W., et al. (2013). Reliability and validity of query

intent assessments. Journal of the American Society for Information

Science and Technology, 64, 2224–2237.

Viera, A.J., Garrett, J.M., & Joanne, M. (2005). Understanding

interobserver agreement: The kappa statistic. Family Medicine, 37, 360–

363.

Yang, J., & Leskovec, J. (2011). Patterns of temporal variation in online

media. In I. King (Ed.), Proceedings of the Fourth ACM International

Conference on Web Search and Data Mining (WSDM′11) (pp. 177–186).

New York, NY: ACM.



DOI: 10.1002/asi

Detecting temporal patterns of user queriesusers.jyu.fi/~swang/publications/JASIST17.pdf ·...

Documents

Transcript of Detecting temporal patterns of user queriesusers.jyu.fi/~swang/publications/JASIST17.pdf ·...