Detecting temporal patterns of user queriesusers.jyu.fi/~swang/publications/JASIST17.pdf ·...
Transcript of Detecting temporal patterns of user queriesusers.jyu.fi/~swang/publications/JASIST17.pdf ·...
Detecting Temporal Patterns of User Queries
Pengjie Ren, Zhumin Chen, and Jun Ma
School of Computer Science and Technology, Shandong University, Jinan, 250101, China.
E-mail: [email protected]; [email protected]; [email protected]
Zhiwei Zhang and Luo Si
Department of Computer Science, Purdue University, West Lafayette, IN 47907.
E-mail: [email protected], [email protected]
Shuaiqiang Wang
Department of Computer Science and Information Systems, Jyväskylä University, Agora, 40014, Finland.
E-mail: [email protected]
Query classification is an important part of exploring thecharacteristics of web queries. Existing studies aremainly based on Broder’s classification scheme andclassify user queries into navigational, informational,and transactional categories according to users’information needs. In this article, we present a novelclassification scheme from the perspective of queries’temporal patterns. Queries’ temporal patterns are inher-ent time series patterns of the search volumes of queriesthat reflect the evolution of the popularity of a query overtime. By analyzing the temporal patterns of queries,search engines can more deeply understand the users’search intents and thus improve performance. Further-more, we extract three groups of features based on thequeries’ search volume time series and use a supportvector machine (SVM) to automatically detect the tem-poral patterns of user queries. Extensive experimentson the Million Query Track data sets of the Text REtrievalConference (TREC) demonstrate the effectiveness of ourapproach.
Introduction
There are temporal aspects to web search queries that
search engines need to account for to understand the
changes in user intent and respond to users with temporally
relevant results (Kulkarni, Teevan, Svore, & Dumais, 2011;
Shokouhi, 2011). For example, for the query “Earthquake,”
users usually search for basic knowledge about earthquakes
when no earthquake events are occurring or have just
occurred. Search engines should rank relevant pages from
sites such as Wikipedia at the top of the result list. However,
in the middle of March 2011, the query “Earthquake” sud-
denly became one of the most popular searches.1 The
increase in search volume was caused by the 9.0 magnitude
earthquake in Japan. During this period, most users issuing
this query focused on the event instead of the basic knowl-
edge about earthquakes. Search engines should accordingly
rank the relevant news articles about the event at the top of
the result list. Other examples are queries such as “World
Cup” or “Presidents Cup in Golf,” where the search volumes
of these queries increase and decrease with a fixed cycle.
Correspondingly, the temporal intents of users change peri-
odically. Therefore, it is important for search engines to
correctly identify those queries and ensure that their results
are temporally relevant (Shokouhi, 2011) or diversified
(Berberich & Bedathur, 2013).
By analyzing user queries in depth, we found that the
search volume time series of user queries have certain pat-
terns, which we refer to as queries’ temporal patterns in this
study. Queries’ temporal patterns are the inherent time series
patterns of the volume of queries, which reflect a query’s
popularity over time. Figure 1 presents some classic tempo-
ral pattern instances. Different patterns can reflect different
search intents. For example, queries with flat curve patterns
usually have fixed intents; for example, “Java JDK” in
Figure 1a (Kulkarni et al., 2011). In contrast, queries with
multiple spikes in their curve patterns, such as in Figure 1c
and 1d, may be temporally ambiguous (Shokouhi, 2011).
That is, when users retrieve them, search engines do not
understand which time intervals (generally corresponding
to the spikes on the curves) the users are targeting. ForReceived March 17, 2015; revised May 18, 2015; accepted May 18, 2015
© 2015 ASIS&T • Published online in Wiley Online Library
(wileyonlinelibrary.com). DOI: 10.1002/asi.23578 1http://www.google.cn/trends/explore#q=earthquake
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, ••(••):••–••, 2015
VC 2015 ASIS&T � Published online 18 August 2015 in Wiley Online
Library (wileyonlinelibrary.com). DOI: 10.1002/asi.23578
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 68(1):113–128, 2017
example, for the query “the Olympics” (Figure 1d), there are
five spikes corresponding to the Olympic Games (including
the Summer and Winter Games) that were held in 2004,
2006, 2008, 2010, and 2012. However, search engines do not
understand the periods that users are targeting. In summary,
by analyzing the temporal patterns of user queries, we can
better understand user intents, which can be used to improve
the performance of search engines (Jones & Diaz, 2007;
Kulkarni et al., 2011; Metzler, Jones, Peng, & Zhang, 2009;
Shokouhi, 2011).
In this article, we first introduce a new query-
classification scheme that groups queries into four
categories—stable queries, one-time burst queries, periodic
multitime burst queries, and aperiodic multitime burst
queries—according to their temporal patterns. Then, we
propose an approach to identify a query’s pattern class in
terms of its time series curve. In particular, we first extract a
collection of features by exploring the shapes, trends,
periods, and bursts in the time series curves. Then, we utilize
a support vector machine (SVM) to detect user queries’
temporal patterns. Because there are some queries whose
records in query logs are too sparse to generate valid time
series curves, we present two methods to approximately
construct their time series curves from the document data
sets. We collected a large number of queries from the Text
REtrieval Conference (TREC) and manually annotated their
temporal pattern categories. Extensive experiments indicate
that our approach can significantly outperform the baselines.
The remainder of this article is organized as follows.
First, we present our new query-classification scheme.
Second, we describe our automatic query pattern detection
approach and present the corresponding experiments. Then,
we discuss the potential applications of our study. Next, we
present related work. Finally, we draw some conclusions and
discuss our future work.
Temporal Pattern-Based Query-Classification Scheme
A query’s temporal pattern can be reflected by the
changes in its search volume over time, which is represented
as a time series. Therefore, we group queries into corre-
sponding temporal pattern classes according to the temporal
characteristics of their time series, as shown in Figure 2.
Each leaf node in Figure 2 represents a specific temporal
pattern type. Next, we describe the definitions of these query
classes and typical time series curves, as illustrated in
Figure 1.
Query with time (QwT). A QwT represents the queries that
contain at least one explicit time expression or time interval,
such as “Earthquake 2008 . . . 2010” (Earthquakes from
2008–2010). These queries are easy to detect by identifying
the explicit time expressions. When a user submits a QwT
query, he or she usually has a specific time constraint in
mind.
Query without time (QoT). A QoT denotes the queries that
do not contain any explicit time expression, such as “Web
page rank.” For QoT, we need to detect their temporal pat-
terns to help us analyze user intents.
Stable query (SQ). A SQ denotes the queries in which the
user’s intent does not focus on any specific time interval.
That is, there are no temporal constraints for their results. SQ
FIG. 1. Query temporal pattern examples from Google Trends. The
horizontal and vertical axes represent the time and query search volume,
respectively.
FIG. 2. Temporal pattern-based classification scheme.
2 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015
DOI: 10.1002/asi114 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
denotes users’ common, frequent, and constant search
intents. Consequently, their time series curves share a stable
trend (e.g., “Java JDK” as shown in Figure 1a).
Nonstable query (NSQ). An NSQ describes the queries
that represent users’ uncommon, occasional search intents.
One-time burst query (OBQ). An OBQ is a type of NSQ.
OBQs are often triggered by one-time, unexpected events.
As a result, their curves all contain a single spike that occurs
when there is a sudden increase followed by a corresponding
decrease in search volume (e.g., “Japan Earthquake” is an
OBQ, as depicted in Figure 1b).
Multitime burst query (MBQ). An MBQ is another type of
NSQ. MBQs are often triggered by events that repeat mul-
tiple times.
Aperiodic multitime burst query (AMBQ). An AMBQ rep-
resents MBQs triggered by unexpected events or user
requests that are issued aperiodically. Time series curves of
AMBQs share a common shape with multiple aperiodic
spikes (e.g., “Earthquake” is an AMBQ, as shown in
Figure 1c).
Periodic multitime burst query (PMBQ). A PMBQ denotes
MBQs that are issued periodically. These queries are usually
triggered by expected events that follow identical or almost
identical cycles during corresponding months of successive
years. Time series curves of PMBQs share a common shape
with multiple periodic spikes. For example, “the Olympics”
becomes popular in a 2-year cycle because the Olympic
Games are held every 2 years (including the Summer and
Winter Games), as shown in Figure 1d.
We analyzed in depth the contents of query instances for
different temporal pattern types. Some examples are sum-
marized in Table 1. We found that the queries of different
temporal pattern types indeed have different characteristics.
The majority of SQs could be characterized as broad,
general queries such as academic or daily stuff-related
queries (e.g., “artificial intelligence” and “pet therapy”). For
these queries, in the absence of an emergent focused intent
(e.g., an unexpected event), their curves remain relatively
flat. In comparison, NSQs are more focused. Many of these
queries refer to events. AMBQs in particular are usually
related to celebrities (e.g., Tom Hanks), disasters (e.g., bliz-
zard warning), and so on.
Automatic Temporal Pattern Detection Approach
Given a new query, we use a machine-learning approach
to detect its temporal pattern (i.e., SQ, OBQ, PMBQ, or
AMBQ, as shown in Figure 2). The framework is shown in
Figure 3. To achieve this goal, the primary task is to define
effective features that can capture the characteristics of dif-
ferent temporal pattern types.
When a user submits a query, we first build its time series
curve by mining query logs. If we cannot build a valid curve
due to sparse or missing data, we approximate the curve
from document data sets. Then, we preprocess the curve and
extract the features. Finally, a classifier is used to detect the
query’s temporal pattern (SVM is used in this article.)
We will discuss the main aspects of the framework
marked in bold in Figure 3.
Time Series Curve Generation
Given a query, we first need to generate its time series
(search volume over time). Let ft(q) denote the search
TABLE 1. Analysis of query instances for different temporal pattern types.
Temporal
pattern types Query instances
SQ Daily Stuff Seattle bus schedule, quality customer service, pet therapy, job salaries, business commerce daily, . . .
Academic Stuff artificial intelligence, associate degree, atomic physics, java jdk, biodiversity, graphics, NASA science, phx AZ,
protozoa, technology assessment, data flow diagram, . . .
. . . . . .
OBQ Political or
Social Events
capitol hill massacre, box fan recall, one million dollar bill, Olympics in Greece, free credit report Ohio, . . .
Disease Events bird flu in the United States, swine influenza, SARS, bird flu avian, . . .
. . . . . .
AMBQ Celebrities Buffalo Levine, Daniel Halpern, Michael Jordan, James Stewart, Tom Hanks, . . .
Disasters earthquake, tornado, tsunami, flooding, drought, blizzard, . . .
. . . . . .
PMBQ Season-Related
Activities,
Places, etc.
spring mill state park, summer job opportunities, summer youth program NY, snow in Italy, rice lake fishing,
Renton water park, ice machine, Erie PA beach, allergic reactions to bites, weather on February, . . .
Sport Competitions world cup, tennis courts, presidents cup in golf, . . .
Tax-Related Stuff taxes prep, taxes forms 1040, tax return direct deposit, social security wage tax, scheduled capital gains, sales
taxes deduction, Ohio state tax return, . . .
. . . . . .
Note. SQ = stable query; OBQ = one-time burst query; AMBQ = aperiodic multitime burst query; PMBQ = periodic multitime burst query.
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015 3
DOI: 10.1002/asiJOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
115
frequency of query q within the tth time interval (the search
volume within time interval t divided by total search
volumes of all time). t = 1 . . . N, where “month” is used
here. N is the number of time intervals (months). The time
series F(q) of a query q over N time intervals is a sequence
of ft(q), denoted as
F q f q f q f q f qt N( ) = ( ) ( ) ( ) ( ){ }1 2, , , , ,… … (1)
For most queries, we can determine F(q) by mining the
query logs. However, there are some queries, especially long
queries, for which we cannot determine F(q) due to the
insufficient search volume of the query logs. For example, if
we submit “a Canadian publishing house book” to Google
Trends,2 it returns “Not enough search volume to show
graphs.” Thus, for these queries, we investigate how to esti-
mate F(q) without the use of query logs.
We present two estimation approaches based on the fol-
lowing phenomenon. There are basically two roles on the
Internet: information publishers and information hunters.
Intuitively, they act consistently over time. In other words,
when an event occurs, information publishers publish the
related news articles, and information hunters simultane-
ously submit queries to search them. The number of relevant
articles reflects the popularity of corresponding queries, and
vice versa (Adar, Teevan, Dumais, & Elsas, 2009; Dakka,
Gravano, & Ipeirotis, 2012). We expect to construct the time
series F(q) from the documents corpus when their search
records in query logs are not sufficient to build the curves.
Two approaches, document-level approximation (DLA) and
word-level approximation (WLA), are proposed to solve this
problem from different levels.
DLA
Suppose that the change in a query’s search frequency
can be reflected by the corresponding number of relevant
documents. Therefore, for a given query q, we use its rel-
evant documents over time to approximate its time series
curve, as follows:
f q P t dP d q
P d qt
d Dd D
kk
( ) ≈ ( ) ( )( )∈∈ ∑∑ ˆ�
(2)
where t represents the time interval within which a docu-
ment is published. P(t|d) equals 1 if the publication time
(Chen, Ma, Cui, Rui, & Huang, 2010) of document d is
within t or 0 otherwise. In this article, Dk denotes the top
k = 100 Google search results of query q. The publication
time of a document is obtained by setting time constraints in
“Google Search Tools” when crawling Google search
results. P(d|q) is the relevance of document d to query q,
which is estimated with the Query Likelihood Model (Ponte
& Croft, 1998). If more relevant documents for query q are
published within the time interval t, Equation (2) will have a
larger value at t.
WLA
We also assume that a query’s search frequencies over
time can be reflected by the occurrence of its words. Thus,
for a given query q, we use its word-occurrence distribution
over time to approximate its time series curve:
f qP w t P w q
DF wt
w q
( )≈ ( ) ( )( )∈
∏ (3)
P(w|t) is the probability of generating the query word w
within the time interval t and is estimated as the term
frequency (TF) of w within t divided by the TF of all
words within t in the document’s corpus with Laplace
smoothing. DF(w) is the document frequency (DF) of
word w. From the perspective of language models for infor-
mation retrieval, P(w|q) is the query language model.
However, most user queries are short, so P(w|q) is usually
estimated with some interpolation approaches (Manning,
Raghavan, & Schütze, 2008). In the article, we use the TF of
word w divided by the TF of all words of query q in the top
k Google search results to estimate P(w|q). k is still set to
100.
Two instances, “Earthquake” and “Boston Marathon,”
are shown in Figure 4, in which the dash-dot line is the real
time series curve built from query logs. We can see that the
three curves fluctuate almost consistently over time.
Preprocessing With Time Series Analysis
Most original time series curves look different from the
typical instances in Figure 1. This is because a curve can be
decomposed into multiple components, together determin-
ing its shape (Brockwell & Davis, 2002; Kulkarni et al.,
2011). In this article, each point of F(q) is decomposed into
three components:2http://www.google.com/trends/explore
FIG. 3. Query temporal pattern detection framework.
4 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015
DOI: 10.1002/asi116 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
f q m s Yt t t t( ) = + + (4)
where mt is a slowly changing component known as the
trend component; st is a component with a fixed cycle,
referred to as the seasonal component; and Yt is the random,
burst, and irregular component.
The query popularity can exhibit an overall increasing or
decreasing trend over time. The trend component mt encodes
this property. However, mt has nothing to do with temporal
pattern types. st and Yt are what we care about here. There-
fore, we should first remove mt from F(q). Here, we use
polynomial fitting to model mt.
Specifically, let F(q) = {f1(q), f2(q), . . . ft(q), . . . , fN(q)}
be the curve we want to preprocess. Remember that
ft(q) = mt+st + Yt. We want to remove the trend component mt
from ft(q), so we model mt as
m W XtT
t= + ξ (5)
where W is the polynomial coefficients vector, and Xt = [1, t,
t2, . . . , tM] is the polynomial vector. The dimension of W and
Xt is M + 1. M also is the degree of the polynomial function.
ξ is random noise. We assume ξ obeys Student’s t distribu-
tion ξ∼St(ξ|0, λ, ν); that is,
P Stξ λ ν ξ λ ν, , ,( ) = ( )0 (6)
Because mt = WTXt + ξ, we have mt∼St(mt|WTXt, λ, ν);
that is,
P m W X St m W Xt t tT
t, , , , ,λ ν λ ν( ) = ( ) (7)
Ideally, to fit mt, we should use the {m1, m2, . . . , mt, . . . , mN}
of each curve as the training data; however, it is difficult to
determine the precise {m1, m2, . . . , mt, . . . , mN} of a curve.
We use F(q) = {f1(q), f2(q), . . . ft(q), . . . , fN(q)} instead. The
reason that we assume ξ∼St(ξ|0, λ, ν) instead of a Gaussian
distribution is that if we use (Xt, ft(q)) to approximate (Xt,
mt), the st and Yt components are noise. The Student’s t
distribution is more robust than is the Gaussian distribution
in this case (Bishop, 2006). For a given curve F(q) = {f1(q),
f2(q), . . . ft(q), . . . , fN(q)}, based on the maximum likelihood
estimation theory, we can write the regularized log likeli-
hood loss function as
L W v q
logP m m m m W X v W
P m W X
t N t
t t
, ,
, , , , , , , ,
log ,
λ
λ λ( )
= − ( ) +
= −
1 22
2… …
,, ,
log , ,
λ λ
λ λ
v W
St f q W X v W
t
N
tT
t
t
N
( ) +
≈ − ( )( ) +
=
=
∑
∑1
2
1
2
2
2
(8)
FIG. 4. Time series curve approximation instances.
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015 5
DOI: 10.1002/asiJOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
117
where log , ,St f q W X vtT
t
t
N
( )( )=∑ λ
1
is the log likelihood and
whereλ2
2W is the regularizer or penalty term, which is used
to avoid overfitting. The objective, L(W, λ, ν|q), can be
optimized with gradient descent to determine the param-
eters’ W.
After we determine W, mt is computed as mt = WTXt. Then,
for each point of a given curve, we remove mt from ft(q) and
finally obtain a processed version of the original curve
F f s Y t Nqtq
t t= = + ={ }1, ,… . Some instances are shown
in Figure 5. The solid line denotes the original curve F(q).
The dotted line denotes the trend component mt. The dashed
line denotes the remaining two components st + Yt.
Feature Extraction
Three groups of features are proposed for the machine-
learning model in this article: basic features, curve distance
features, and regression features. Features 1 to 4 are basic
features, Features 5 to 8 are curve distance features, and
Features 9 to 11 are regression features, as shown in Table 2.
We describe them in detail next.
Features 1–2. SQ is stable while NSQ is burst; therefore,
to distinguish them, the mean and standard deviation (SD) of
Fq are two obvious features.
Feature 3. It is difficult to separate OBQ and MBQ
because all of their SDs are usually large. Therefore, we
define a new feature, Maximum Spike (MS), as follows:
MSmax f t N
f
tq
tq
t
N=
={ }=∑
1
1
…(9)
MS represents the proportion of the maximum search volume.
Feature 4. Another feature, Gap between the First and
Second Maximum Spike (GFSMS), is defined as
GFSMSf f
f
FMq
SMq
tq
t
N=
−
=∑ 1
(10)
where f max f t NFMq
tq= ={ }1… is the maximum fre-
quency of the highest spike; that is, the maximum point of
the curve. f f t N f fSMq
tq
FM Cq
FMq= ={ } − {{ ± ±max , ,,1 1… …
f c mFMq = }}1… is the maximum frequency of the
second-highest spike, where the points f fFM Cq
FMq
± ±{ , , ,… 1
f c mFMq = }…1 denote the highest spike that is,, 2m points
around the maximum point. Therefore, the numerator of
Function 8 estimates the gap between the maximum points
of the first- and second-highest spikes. Here, m is a pre-
defined parameter, and 2m represents the duration of the
highest spike. m is determined by analyzing the highest
spikes of all NSQ queries in the data set.3 First, we define a
threshold r. Given a query q, we validate whether the fol-
lowing inequality is satisfied with increases in m for differ-
ent values of r.
min f f f c m r fFM Cq
FMq
FMq
FMq
± ± ={ } > ∗, , ,… …1 1 (11)
If the inequality is satisfied, we increase the count of satis-
fied queries for the corresponding m value. m values are
tested from 1 month to 6 months. The final result is shown in
Figure 6. The x-axis represents different values of 2m. The
y-axis represents the proportion of queries satisfying
Inequality 11. We can see that if 2m = 4, it can cover at least
74.6% of queries regardless of the different values of r.
Thus, without a loss of generality, we set m = 2 months in
this article.
Features 5–8. The next four features are based on the
curve distance. Given two preprocessed query curves Fq
1
and Fq
2′, the distance, Distance F F
q q1 2, ′( ), is defined as
follows:
Distance F F minF F
F
q qn
q
n
q
q1 2
1 2
1
, ,′ ( )
′
( ) =−
αα
(12)
where F n
q
2( )′ is the result of shifting time series F
q2
′ by n time
units, and • is the l2 norm. The distance measure is invariant
to scaling and translation of the time series. It finds the
optimal alignment (translation n) and the scaling coefficient
for matching the shapes of the two time series. With n fixed,
F F
F
q
n
q
q
1 2
1
− ( )′α
is a convex function of α, and therefore, we can
find the optimal value by setting the gradient to zero:
α =( ) ( )
′
( )′
F F
F
q T
n
q
n
q
1 2
22
. It is difficult to find the optimal n. In prac-
tice, we traverse all possible values of n to determine the
minimum distance (Yang & Leskovec, 2011).
For a given curve Fq, we use the average distance to the
queries in the same temporal pattern class as one feature.
Thus, we obtain four curve distance features, represented as
DSQ, DOBQ, DAMBQ, and DPMBQ, which are computed based on
the definition of the curve distance, Distance F Fq q
1 2, ′( ), as
follows:
D FDistance F F
MSQ
q
qiSQ
i
M
( ) =( )=∑ ,
1
1
1
D FDistance F F
MOBQ
q
qiOBQ
i
M
( ) =( )=∑ ,
1
2
2
(13)
D FDistance F F
MAMBQ
q
qiAMBQ
i
M
( ) =( )=∑ ,
1
3
3
D FDistance F F
MPMBQ
q
qiPMBQ
i
M
( ) =( )=∑ ,
1
4
4
3http://trec.nist.gov/data/million.query.html
6 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015
DOI: 10.1002/asi118 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
FIG
.5
.R
emov
ing
the
tren
dco
mp
on
ent.
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015 7
DOI: 10.1002/asiJOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
119
where FiSQ, Fi
OBQ, FiAMBQ, and Fi
PMBQ are some curves of the
four pattern types annotated in advance. In this article, we
use all curves in the training set.
Features 9–11. The ninth feature, cutoff, is a function
mapping:
cutoff F R Rq n( ) →: (14)
where Rn is the feature space of a given query curve Fq.
Cutoff is used to roughly identify spikes. Here, a spike is
defined as some continuous points whose values are larger
than the cutoff. We need to learn a cutoff function. The first
step is building the training data (Fq, cutoff). Given an Fq
(e.g., Figure 7), we can easily annotate its temporal pattern.
However, it is hard to annotate its cutoff value. Fortunately,
we find that its approximate pseudo-cutoff can be estimated
as follows:
pseudocutoff F
median value of area of Fig if q SQ
median va
q( )
==“ ” .1 7
llue of area of Fig if q OBQ
median value of area of Fig if q
“ ” .
“ ” .
2 7
3 7
== MMBQ
⎧⎨⎪
⎩⎪
(15)
We use the former eight features (Features 1–8) as the fea-
tures of support vector regression (SVR) (C.C. Chang &
Lin, 2011) to learn a nonlinear model as the cutoff function.
For SVR, we use Gaussian kernel function with the default
parameter settings in LIBSVM (C.C. Chang & Lin, 2011).
As stated earlier, the cutoff can be used to detect spikes of
Fq roughly, and we define the number of these detected
spikes as the 10th feature, Number of Spikes. If multiple
spikes exist, we use yi to represent the time interval between
the middle points of two neighboring spikes. Then, we obtain
a sequence y1, y2, . . . , yw. The 11th feature, Period of Spikes
(PS), is computed as the SD of the sequence. Otherwise, if
there is no or one spike, we set PS with extreme values.
Experiments
In this session, we carry out experiments to demonstrate
the performance of our time series curve approximation
approaches and temporal pattern detection approach.
Experimental Setup
Data Sets. We first randomly extract approximately 15,000
queries from the Web Track of TREC3 and submit each
query to Google Trends2 to download its search volume file.
The numbers in the file reflect the search volume for the
particular query, relative to the total search volume con-
ducted using Google over time. We have to use this file as
the corresponding query’s frequency data to demonstrate
our temporal pattern detection algorithm because it is very
difficult to obtain real, large-scale, and long-time query
logs from commercial search engines. Fortunately, these
data are suitable for both our approaches and the baselines.
For each query, we also collect web pages from January
2004 to July 2013. Specifically, for each month, we issue
each query to Google Search with the condition of a time
range and collect the top-100 results in the ranking list. For
TABLE 2. Summary of features.
Groups Symbols Descriptions
Basic
Features
M Basic Features capture the intuitive
characteristics of different time series.SD
MS
GFSMS
Curve
Distance
Features
DSQ Curve Distance Features capture the
differences of different pattern types
based on the Curve Distance (i.e.,
Equation [12]).
DOBQ
DAMBQ
DPMBQ
Regression
Features
Cutoff Regression Features roughly identify
spikes of the time series with a
regression approach.
No. of Spikes
Period of Spikes
FIG. 6. Analysis of average spike duration.
FIG. 7. Approximate cutoff of training data.
8 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015
DOI: 10.1002/asi120 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
example, “Earthquake” is submitted with the time condition
“1 Jan, 2004–31 Jan, 2004,” which can be specified in
“Google Search Tools.” We use the collected data set to
estimate time series curves of these queries with the
approaches DLA and WLA, respectively. The data set is
available online.4
Annotations. We employed four assessors to manually
annotate the patterns of these queries in terms of the tax-
onomy defined in Figure 2. Three assessors were under-
graduate students, and the fourth was a graduate student. All
assessors were trained before they began to label. For each
query and its corresponding time series curve, three asses-
sors first annotated its pattern type. If two or three of their
annotations were consistent, we annotated the query with
the majority type. Otherwise, the fourth assessor made the
final decision. All assessors used the same user interface to
annotate these queries, as shown in Figure 8. For each query,
the assessors either annotated it as one pattern type or anno-
tated it as “hard to identify” if they could not make the
decision. The average kappa statistic (Viera, Garrett, &
Joanne, 2005) value is 0.81, κ(User1, User2) = 0.84,
κ(User1, User3) = 0.76, κ(User2, User3) = 0.84. On one
hand, the κ value is larger than 0.8, which means that the
annotations of the three assessors have good consistency. On
the other hand, it is only slightly larger than 0.8, which
indicates that the task is difficult. According to our manual
annotations, the percentages of different pattern types are
summarized in Table 3. AMBQ queries account for the
largest percentage, and PMBQ queries account for a small
percentage of all queries.
Baselines. No existing studies have proposed approaches
to comprehensively detect temporal patterns. The only
known work closely related to our study was conducted by
Shokouhi (2011). He used time-series decomposition tech-
niques to identify seasonal queries (i.e., PMBQ) (Shokouhi,
2011). For a given query, he first converted its historical
frequency into time series with monthly splits. He then
decomposed the time series by applying Holt–Winters
additive smoothing (Cleveland, Cleveland, McRae, &
Terpenning, 1990). If the decomposed seasonal component
and the raw data have similar distributions, the query is
classified as seasonal. We refer to this baseline as detecting
seasonal queries with time series-analysis (DSQTSA). The
other two baselines use 1-Nearest Neighbor classification
with the curve distance (i.e., Function 10) and Euclidean
distance as distance metrics, which are denoted as 1NNCD
and 1NNED, respectively. All queries used in our experiments
have time series obtained from Google Trends.
OurApproachREAL used the time series from Google Trends.
For OurApproachDLA and OurApproachWLA, we assume that
the corresponding time series data sets are not available. We
used DLA and WLA as the approximation approaches.
Evaluation measures. We use Precision (P), Recall (R),
and F1 to evaluate the temporal pattern detection results. If
the query category classified by the algorithm agrees with
the manually annotated category, then it is a correct classi-
fication. Precision is the fraction of classified query catego-
ries that are correct. Recall is the fraction of correct query
categories that are classified. The F1 score is calculated
using the following function: F1 = 2 * (P * R)/(P + R).
Parameter settings. The C-Support Vector Classification
in LIBSVM (C.C. Chang & Lin, 2011) with the Gaussian
kernel function is used in this article. There are two hyper-
parameters γ and C. We utilized the standard grid search
approach to find the best parameter values. The tested values
of C and γ vary from 2−6 to 26. Table 4 reports the parameter
tuning results for SQ type. Similar results are achieved for
the other three types. Each value in the tables is an average4http://ir.sdu.edu.cn/exp/dtp.htm
FIG. 8. User interface for assessors.
TABLE 3. Query percentages for different pattern types.
QwT SQ OBQ AMBQ PMBQ
Percentages 12% 18% 19% 45% 6%
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015 9
DOI: 10.1002/asiJOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
121
over fivefold cross-validation. The worst results and the best
results are marked with wavy underlines and straight under-
lines, respectively. The F1 ranges are 0.877 to 0.906 for SQ,
0.862 to 0.910 for OBQ, 0.917 to 0.918 for AMBQ, and
0.812 to 0.905 for PMBQ. Because the parameter settings of
the best results for four pattern types are different, we report
experimental results with default parameter settings (i.e.,
γ =1
11, C = 1) to compare with the baseline approaches.
Performance Comparison
Results and discussion. The evaluation results of our
approaches and the baselines are summarized in Table 5 and
Figure 9. Obviously, OurApproachREAL achieves the highest
performance and significantly outperforms the baselines for
all four classes. We observe that the Precision of SQ is the
worst compared with that of the other three patterns. The
possible reason is that some NSQ curves usually have small
spikes. As a result, these queries might be mistakenly clas-
sified as SQ. In addition, the Recall of PMBQ is the worst
compared with the other three patterns because if the spike
fluctuation of PMBQ is not large enough, it is difficult to
detect. As a result, the query might be mistakenly classified
as SQ, AMBQ, or OBQ. On the contrary, the Precision of
OBQ is the best among these four classes because OBQ
queries usually have larger SDs, MSs, and smaller Ms. This
characteristic is captured by our approach to effectively
identify OBQ. Compared with the baselines, especially
DSQTSA, our approach achieves higher performance for
PMBQ because both 1NN and DSQTSA use a simple, single
approach to compute the differences between curves. In
contrast, our approach integrates multiple features by deep
analysis into the characteristics of different time series
curves. In summary, the results indicate that our approach
can help to effectively identify temporal patterns of queries.
For the approximation of the time series curves, both
DLA and WLA are helpful to some extent. This result is
reasonable because the dynamic behaviors of web informa-
tion and user queries are consistent over time. That is, the
changing of documents can reflect the popularity of the
corresponding queries, and vice versa. We can see that
OurApproachWLA is slightly better than OurApproachDLA.
The possible reason for this behavior is that DLA may intro-
duce more noise because it considers whole documents
instead of specific query terms to generate the curves.
However, the temporal pattern detection approach with esti-
mated curves is still not effective enough. Further work is
TABLE 4. Grid search of parameters γ and C of SVM for SQ type with F1 measure.
C
γ 2−6 2−5 2−4 2−3 2−2 2−1 20 21 22 23 24 25 26
2−6 0.885 0.886 0.880 0.883 0.881 0.878 0.879 0.880 0.884 0.887 0.891 0.894 0.896
2−5 0.886 0.879 0.882 0.880 0.878 0.879 0.881 0.883 0.889 0.891 0.894 0.897 0.897
2−4 0.880 0.883 0.881 0.879 0.879 0.881 0.884 0.888 0.892 0.896 0.897 0.899 0.900
2−3 0.882 0.881 0.879 0.880 0.881 0.884 0.888 0.894 0.897 0.898 0.899 0.900 0.902
2−2 0.882 0.879 0.880 0.881 0.885 0.891 0.895 0.896 0.899 0.899 0.902 0.904 0.905
2−1 0.880 0.880 0.883 0.887 0.892 0.896 0.898 0.899 0.900 0.901 0.904 0.904 0.905
20 0.881 0.883 0.886 0.891 0.895 0.897 0.898 0.901 0.902 0.903 0.904 0.906 0.905
21 0.883 0.884 0.891 0.894 0.897 0.898 0.900 0.902 0.904 0.904 0.905 0.904 0.904
22 0.886 0.885 0.886 0.886 0.880 0.883 0.880 0.878 0.878 0.880 0.882 0.886 0.888
23 0.886 0.885 0.885 0.880 0.883 0.881 0.879 0.878 0.881 0.883 0.886 0.889 0.893
24 0.886 0.885 0.880 0.884 0.881 0.877 0.880 0.881 0.883 0.887 0.891 0.893 0.896
25 0.885 0.880 0.883 0.881 0.879 0.879 0.881 0.883 0.889 0.891 0.894 0.898 0.899
26 0.879 0.883 0.882 0.879 0.879 0.881 0.883 0.888 0.892 0.896 0.898 0.899 0.900
TABLE 5. Performance comparison.
Models
SQ OBQ AMBQ PMBQ
P R F1 P R F1 P R F1 P R F1
OurApproachREAL 0.850a 0.921 0.884a 0.926a 0.857 0.890a 0.918a 0.916a 0.917a 0.909ab 0.815b 0.859ab
OurApproachDLA 0.515 0.584 0.547 0.511 0.502 0.506 0.442 0.581 0.502 0.467 0.498 0.482
OurApproachWLA 0.526 0.567 0.546 0.522 0.531 0.526 0.451 0.430 0.440 0.481 0.515 0.497
DSQTSA x x x x x x x x x 0.607 0.693 0.647
1NNCD 0.793 0.547 0.647 0.563 0.882 0.687 0.849 0.728 0.784 0.507 0.812 0.624
1NNED 0.612 0.943 0.742 0.446 0.764 0.563 0.879 0.552 0.678 0.623 0.707 0.662
Note. a,bindicate that our approaches make statistically significant differences (i.e., p < .05 with two-tailed t-test) to the baseline methods, 1NN and
DSQTSA, respectively. The results of OurApproach are achieved with the Gaussian kernel function and default parameter settings in LIBSVM (Chang & Lin,
2011). Fivefold cross-validation is adopted for all models.
10 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015
DOI: 10.1002/asi122 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
still necessary to design more effective approaches to
approximate time series curves.
Feature effectiveness analysis. We analyze the effective-
ness of each feature on the overall performance of Our-
ApproachREAL. The results are summarized in Table 6.
Generally, discarding any feature leads to a decrease in
performance. Moreover, some features lead to a larger
decline than do the others, such as, DOBQ, DAMBQ. The results
only reflect the effects of different features to some extent
because the effects of a single feature might overlap with the
combination of multiple features.
We further analyze some typical features’ effects by plot-
ting the distributions of the query instances in the feature
space, as shown in Figure 10. It is obvious that the features
Mean and Standard Deviation can effectively distinguish SQ
from NSQ, as illustrated in Figure 10a. SQs and especially
SDs are generally small. This behavior is reasonable because
the curves of SQ queries are more flat, which means small
SDs. Moreover, after preprocessing (removing trend compo-
nents), the Ms also are generally small. From Figure 10b–g,
we observe that combinations of the features MS, GFSMS,
Standard Deviation, DOBQ, DAMBQ, DPMBQ, and DSQ can clas-
sify OBQ and MBQ well. As described in the figures, the
MSs and GFSMSs of OBQ tend to be larger than the other
classes. The explanation is that the only spike in the OBQ
curves accounts for a large proportion of the search volumes,
which leads to large MS and GFSMS values. Moreover, the
Curve Distance between OBQ queries is smaller because
OBQ curves are easier to match with each other than with
the multiple spikes of MBQ queries. As for AMBQ and
PMBQ queries, DOBQ, DAMBQ, and DPMBQ already can effec-
tively classify them, as illustrated in Figure 10f and 10h.
Some other features, such as cutoff and PS, also help
enhance the performance. In summary, all features are
useful and effective to distinguish the queries from different
dimensions.
Classification scheme validation. As for our query classi-
fication scheme, we further used an unsupervised method
(i.e., clustering) to verify its correctness and discreteness.
Specifically, we used Lloyd’s (1982) k-means method to
cluster these queries, with the number of clusters ranging
from 1 to 6. We used two families of metrics, pair counting
measures and set-matching based measures, to evaluate the
clustering results. These metrics are widely used to evaluate
the performance of clustering algorithms (Amigó, Gonzalo,
Artiles, & Verdejo, 2009). The results are shown in Table 7.
From the results, we can see that the best number of
clusters is four. Moreover, high performance is achieved,
F1a = 0.886 and F1b = 0.871, which indicates that the clus-
tering results are highly consistent with manual annotations.
This result is strong evidence for the correctness and dis-
creteness of our classification scheme.
Application Discussion
In this section, we discuss the potential applications of
our study. Note that for OBQ-type queries, our method
cannot detect them until the search volumes increase.
However, our method can still help improve the search
results during the middle and later stages of the correspond-
ing events. Figure 11 presents two examples. The two
queries correspond to two events, which were just
a
b
FIG. 9. Overall model performance comparison.
TABLE 6. Feature effectiveness analysis.
Removed features
Average
precision Average F1
M 0.108↓ 0.141↓SD 0.106↓ 0.135↓MS 0.102↓ 0.130↓GFSMS 0.107↓ 0.135↓DSQ 0.107↓ 0.131↓DOBQ 0.111↓ 0.158↓DAMBQ 0.129↓ 0.178↓DPMBQ 0.114↓ 0.125↓Cutoff 0.107↓ 0.092↓Number of Spikes 0.106↓ 0.090↓Period of Spikes 0.107↓ 0.112↓
Note. The values in this table indicate an absolute drop compared with
the performance with all features.
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015 11
DOI: 10.1002/asiJOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
123
FIG. 10. Feature effectiveness analysis. Note that a single scatter plot cannot reflect the real situation in high dimensional feature space.
12 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015
DOI: 10.1002/asi124 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
happening. Our approach can accurately identify their
pattern types as OBQ. Then, the search results can be
improved for the subsequent searches.
We suggest that the queries of different temporal pattern
types should be addressed by a search engine in different
ways:
• For SQ-type queries, they denote users’ common, frequent,
information needs. That is, users’ search intents usually do
not change over time (Kulkarni et al., 2011). Therefore, rel-
evance is the most significant measure for the results ranking.
The most relevant pages, regardless of whether they were
published previously or recently, should be ranked at the
top of the results list (Adar et al., 2009; Elsas & Dumais,
2010). Moreover, web pages from authoritative sites such as
Wikipedia are more valuable and should especially be
considered.
• For OBQ-type queries, regardless of whether corresponding
events are happening or have happened, users’ intents mostly
focus on documents about these events (Joho, Jatowt, & Roi,
2013; McCreadie, Macdonald, & Ounis, 2013). In this case,
search engines should rank the relevant documents that were
published during the period of the events at the top of the
results list. Moreover, freshness is very important, so search
engines should more actively crawl new web pages and
update the contents on old web pages (Dong et al., 2010;
Olston & Pandey, 2008).
• For AMBQ-type queries, they have multiple spikes at different
time points that may correspond to different events. There-
fore, search engines should pay attention to the changes in the
query intents by analyzing changes in the search results and
user-behavior data (Kulkarni et al., 2011). Moreover, user
search intents behind AMBQ queries may be ambiguous. That
is, different users may issue the same query to find informa-
tion corresponding to different events. Search engines should
diversify their search results to make sure that no matter what
the intent is, there is at least one satisfactory result (Berberich
& Bedathur, 2013).
FIG. 10. Continued.
TABLE 7. Experimental results of Lloyd’s k-means clustering with
cosine similarity. F1a = 2*Precision*Recall/(Precision+Recall) F1b =2*Purity*Inverse Purity/(Purity+Inverse Purity)
Cluster no.
Pair counting measures Set-matching based measures
Precision Recall F1a Purity Inverse Purity F1b
K = 1 0.400 1.000 0.572 1.000 0.577 0.732
K = 2 0.736 0.945 0.828 0.967 0.736 0.836
K = 3 0.639 0.549 0.590 0.696 0.736 0.716
K = 4 0.875 0.897 0.886 0.873 0.869 0.871
K = 5 0.792 0.475 0.594 0.560 0.821 0.666
K = 6 0.765 0.386 0.513 0.535 0.821 0.648
FIG. 11. Query temporal pattern examples from Google Trends for two ad
hoc queries.
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015 13
DOI: 10.1002/asiJOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
125
• For PMBQ-type queries, they represent events that follow
identical cycles. Therefore, search engines can predict future
events to respond to the users with temporally relevant search
results (Shokouhi, 2011). Although the spikes are usually
associated with regularly occurring events, the user search
intents behind PMBQ queries also may be temporally
ambiguous. In this case, search engines can temporally diver-
sify search results (Berberich & Bedathur, 2013). Moreover,
Alfonseca, Ciaramita, and Hall (2009) showed that the query
periodicities also could be used to improve the performance of
query suggestions.
Related Work
There is a large amount of previous work exploring the
characteristics of web queries, among which query classifi-
cation is an important part.
Broder’s classification scheme. In 2002, Broder presented
a trichotomy of web queries: navigational queries, informa-
tional queries and transactional queries. Navigational
queries are intended to find a specific website that the user
has in mind. Informational queries are intended to find infor-
mation about a topic. Transactional queries are intended to
perform web-mediated activities. The taxonomy has been
adopted by many studies (Herrera, de Moura, Cristo, Silva,
& da Silva, 2010; Jansen & Booth, 2010; Jansen, Booth, &
Spink, 2007, 2008; U. Lee, Liu, & Cho, 2005; Rose &
Levinson, 2004). Lewandowski, Drechsler, and Mach
(2012) used Broder’s classification scheme to measure the
reliability of query intent assessments to find out whether
manual intent annotations were sufficiently reliable to be
used as test data for automatic approaches. Rose and
Levinson (2004) described a framework for understanding
the underlying goals of user queries, and their experience in
using the framework to manually classify queries from a
web search engine. Their analysis suggested that naviga-
tional queries were less prevalent than generally believed.
They also considered that the entries within the transactional
class could be classified into more subclasses. Therefore,
they replaced the transactional class with the resource class
and subdivided the informational class and the resource
class into more detailed classes (Rose & Levinson, 2004).
Jansen et al. presented a methodology to classify user
queries in terms of a set of characteristics for each category
in Broder’s taxonomy. They implemented a classification
algorithm and automatically classified 1 million queries
from a web search engine log submitted by several hundred
thousand users (Jansen et al., 2007, 2008; Jansen & Booth,
2010). U. Lee et al. (2005) proposed an approach for query
classification by considering the click distribution, the
average click frequency of each query, and the anchor text
distribution. They only considered two classes: navigational
queries and informational queries (U. Lee, Liu, & Cho,
2005). Herrera et al. (2010) presented a study on the impact
of using several features extracted from the document col-
lections and query logs for the query classification task.
Kang and Kim (2003) explored the occurrence patterns of
query terms in web pages to detect the goal of a query as
either navigational or informational.
Time series patterns based classification schemes. Instead
of classifying queries as informational, navigational, and
transactional, the following studies developed classification
schemes based on time series patterns. Chien and Immorlica
(2005) utilized temporal correlation to identify sets of
similar queries. They suggested that queries with similar
frequency patterns were likely to be related (Chien &
Immorlica, 2005). Radinsky et al. (2012) explored how to
use time series techniques to model and predict user behav-
ior based on queries over time, including trends, periodici-
ties, and surprises. Shokouhi (2011) investigated seasonal
queries that represent seasonal events that repeat every year
(i.e., PMBQ queries in this article). He focused on detecting
seasonal queries using time series analysis technologies
(Shokouhi, 2011). Hong, Lin, and Wang (2002) used data-
mining techniques to mine browsing patterns from query
logs to make rules for web page retrieval. Tseng, Lin, and
Chang (2008) proposed a novel data-mining algorithm
named Temporal N-Gram (TN-Gram) to discover and
predict user navigation patterns by mining the query
logs). Kulkarni et al. (2011) studied the characteristics of
user queries from four aspects by manually analyzing the
query logs: the number of spikes, the shapes of the
spikes, the periodicity of the queries, and the overall trends
in popularity.
Other classification schemes. The following studies
explore their own classification schemes. Verberne et al.
(2013) carried out similar studies on measuring the reliabil-
ity of query intent assessments similar to those of Lewan-
dowski, Drechsler, and Mach (2012). The difference is that
they use a newly defined query classification scheme. Their
classification scheme consists of the following dimensions:
Topic, Action type, Modus, Source authority sensitivity,
Spatial sensitivity, Time sensitivity, and Specificity
(Verberne et al., 2013). Roy, Katare, Ganguly, Laxman, and
Choudhury (2015) took a deeper look at queries, focusing on
individual words as possible indicators of user intent. Their
query term taxonomy was derived through rigorous manual
analysis of large query logs (Roy et al., 2015). Jones and
Diaz (2007) noted that temporal properties of queries can be
used to diagnose the quality of the retrieval. They presented
three temporal classes of queries: atemporal queries, tempo-
rally unambiguous queries, and temporally ambiguous
queries (Jones & Diaz, 2007). Metzler et al. (2009) investi-
gated implicitly year-qualified queries that do not actually
contain a year, but the user might have implicitly formulated
the query with a specific year in mind. Asur and Buehrer
(2009) studied temporal signatures of three different types
of queries, Navigational, Adult, and News queries, and pro-
posed a method to classify a query into these three types by
considering the trends in the queries’ time series. Dakka,
Gravano, and Ipeirotis (2008) proposed a framework for
14 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015
DOI: 10.1002/asi126 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
handling time-sensitive queries and automatically identify-
ing the important time intervals that were likely to be of
interest for a query. Bhatia, Brunk, and Mitra (2012) pre-
sented an analysis of the query logs from a commercial web
search engine and studied the web search queries for their
diversification requirements. They analyzed queries based
on click entropy and popularity, and proposed a query tax-
onomy based on their diversification requirements. They
then automatically classified web search queries into one of
the classes of their proposed taxonomy (Bhatia et al., 2012).
Y. Chang, He, Yu, and Lu (2006) analyzed user goals from
the viewpoint of natural language processing. They assumed
that the subject of the hidden sentence in the user’s mind
was the user himself and that the combined pair of the verb
and object was called “VOpair.” Given a query, they
attempted to identify the user’s goal (VOpair) from the web
search results. Y.J. Lee, Lee, Chai, Hwang, and Ryu (2009)
proposed a new temporal data-mining technique that could
extract temporal interval relation rules from temporal inter-
val data using Allen’s theory (Y.J. Lee et al., 2009). They
also investigated a relatively unexamined query type:
queries composed of URLs. The extents, variations, and user
click-through behaviors were examined to determine the
intents behind URL queries (Lee & Sanderson, 2010).
Nunes, Ribeiro, and David (2008) investigated the use of
temporal expressions in web queries. They found that tem-
poral expressions were scarcely used in the queries, except
for some specific topics such as Autos, Sports, News, and
Holidays (Nunes et al., 2008).
Although many studies have investigated the characteris-
tics of user queries, to our knowledge, none of them studied
the problem from the perspective employed in this article.
Generally, the differences between our study and existing
studies are threefold: First, we use a different classification
scheme whereas most existing studies have focused on
Broder’s classification scheme or its variation. Second, the
most relevant study was performed by Kulkarni et al.
(2011). However, they did not propose an automatic detec-
tion approach. Third, although Shokouhi (2011) proposed an
automatic detection approach, he only focused on seasonal
queries (i.e., PMBQ) using time series analysis technologies.
Conclusion and Future Work
In this article, we study the problem of how to detect the
temporal patterns of user queries. We propose a query clas-
sification method to solve this problem. Moreover, for the
queries in which we cannot determine their frequency
curves due to a lack of data recorded in query logs, we
propose two approaches to approximate their time series
curves from the document data sets. Our work can help to
better understand query intents and to further improve the
performance of search engines.
In future work, we will explore more features for
temporal-pattern-based query classification. In particular,
we will explore features contained in web pages so that we
can detect the pattern types as soon as possible. For
example, we hope to predict OBQ-type queries tbefore the
search volume increases sharply (i.e., in the early stages of
events). We also plan to study how different temporal pat-
terns can be used to construct a retrieval model to improve
the performance of information retrieval.
Acknowledgments
This work is supported by the Natural Science Founda-
tion of China (61272240, 61103151), the Doctoral Fund of
the Ministry of Education of China (20110131110028), the
Natural Science foundation of Shandong province
(ZR2012FM037), the Excellent Middle-Aged and Youth
Scientists of Shandong Province (BS2012DX017), and the
Fundamental Research Funds of Shandong University.
References
Adar, E., Teevan, J., Dumais, S.T., & Elsas, J.L. (2009). The web changes
everything: Understanding the dynamics of web content. In
R.A. Baeza Yates, P. Boldi, B.A. Ribeiro Neto, & B.B. Cambazoglu
(Eds.), Proceedings of the Second ACM International Conference on
Web Search and Data Mining (WSDM′09) (pp. 282–291). New York,
NY: ACM.
Alfonseca, E., Ciaramita, M., & Hall, K. (2009). Gazpacho and summer
rash: Lexical relationships from temporal patterns of Web search queries.
In P. Koehn & R. Mihalcea (Eds.), Proceedings of the 2009 Conference
on Empirical Methods on Natural Language Processing (EMNLP′09)
(pp. 1046–1055). Stroudsburg, PA,: ACL.
Amigó, E., Gonzalo, J., Artiles, J., & Verdejo, F. (2009). A comparison of
extrinsic clustering evaluation metrics based on formal constraints. Infor-
mation Retrieval, 12, 461–486.
Asur, S., & Buehrer, G. (2009). Temporal analysis of web search query-
click data. In C.L. Giles (Ed.), Proceedings of the Third Workshop on
Social Network Analysis Workshop (SNA-KDD′09) (pp. 1–8). New
York, NY: ACM.
Berberich, K., & Bedathur, S. (2013). Temporal diversification of search
results. SIGIR 2013 Workshop on Time-aware Information Access,
Dublin, Ireland.
Bhatia, S., Brunk, C., & Mitra, P. (2012). Analysis and automatic classifi-
cation of web search queries for diversification requirements. Journal of
the American Society for Information Science and Technology, 49, 1–10.
Bishop, C.M. (2006). Pattern recognition and machine learning. New York,
NY: Springer.
Brockwell, P.J., & Davis, R.A. (2002). Introduction to time series and
forecasting. New York, NY: Springer.
Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36, 3–10.
Chang, C.C., & Lin, C.J. (2011). Libsvm: A library for support vector
machines. ACM Transactions on Intelligent Systems and Technology, 2,
1–27.
Chang, Y., He, K., Yu, S., & Lu, W. (2006). Identifying user goals from web
search results. In T. Nishida, Z. Shi, U. Visser, X. Wu, J. Liu, B. Wah, . . .
Y. Cheung (Eds.), Proceedings of the 2006 IEEE/WIC/ACM Interna-
tional Conference on Web Intelligence (WI′06) (pp. 1038–1041). Wash-
ington, DC: IEEE Computer Society.
Chen, Z., Ma, J., Cui, C., Rui, H., & Huang, S. (2010). Web page publica-
tion time detection and its application for page rank. In F. Crestani &
S. Marchand Maillet (Eds.), Proceedings of the 33rd International ACM
SIGIR Conference on Research and Development in Information
Retrieval (SIGIR′10) (pp. 859–860). New York, NY: ACM.
Chien, S., & Immorlica, N. (2005). Semantic similarity between search
engine queries using temporal correlation. In A. Ellis & T. Hagino (Eds.),
Proceedings of the 14th International Conference on World Wide Web
(WWW′05) (pp. 2–11). New York, NY: ACM.
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015 15
DOI: 10.1002/asiJOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi
127
Cleveland, R.B., Cleveland, W.S., McRae, J.E., & Terpenning, I. (1990).
STL: A seasonal-trend decomposition procedure based on loess. Journal
of Official Statistics, 6, 3–73.
Dakka, W., Gravano, L., & Ipeirotis, P.G. (2008). Answering general time
sensitive queries. In J.G. Shanahan (Ed.), Proceedings of the 17th ACM
Conference on Information and Knowledge Management (CIKM′08)
(pp. 1437–1438). New York, NY: ACM.
Dakka, W., Gravano, L., & Ipeirotis, P.G. (2012). Answering general time-
sensitive queries. IEEE Transactions on Knowledge and Data Engineer-
ing, 24, 220–235.
Dong, A., Chang, Y., Zheng, Z., Mishne, G., Bai, J., Zhang, R., et al.
(2010). Towards recency ranking in web search. In B.D. Davison &
T. Suel (Eds.), Proceedings of the Third ACM International Conference
on Web Search and Data Mining (WSDM′10) (pp. 11–20). New York,
NY: ACM.
Elsas, J.L., & Dumais, S.T. (2010). Leveraging temporal dynamics of
document content in relevance ranking. In B.D. Davison & T. Suel
(Eds.), Proceedings of the Third ACM International Conference
on Web Search and Data Mining (WSDM′10) (pp. 1–10). NewYork, NY:
ACM.
Herrera, M.R., de Moura, E.S., Cristo, M., Silva, T.P., & da Silva, A.S.
(2010). Exploring features for the automatic identification of user goals
in web search. Information Processing & Management, 46, 131–142.
Hong, T.P., Lin, K.Y., & Wang, S.L. (2002). Mining linguistic browsing
patterns in the world wide web. Soft Computing, 6, 329–336.
Jansen, B.J., & Booth, D. (2010). Classifying web queries by topic and user
intent. In E. Mynatt & D. Schoner (Eds.), CHI ′10 Extended Abstracts on
Human Factors in Computing Systems (CHI EA′10) (pp. 4285–4290).
New York, NY: ACM.
Jansen, B.J., Booth, D.L., & Spink, A. (2007). Determining the user intent
of web search engine queries. In C. Williamson & M.E. Zurko (Eds.),
Proceedings of the 16th International Conference on World Wide Web
(WWW′07) (pp. 1149–1150). New York, NY: ACM.
Jansen, B.J., Booth, D.L., & Spink, A. (2008). Determining the informa-
tional, navigational, and transactional intent of web queries. Information
Processing & Management, 44, 1251–1266.
Joho, H., Jatowt, A., & Roi, B. (2013). A survey of temporal web search
experience. In D. Schwabe, V. Almeida, & H. Glaser (Eds.), Proceedings
of the 22nd International Conference on World Wide Web Companion,
(WWW′13 Companion) (pp. 1101–1108). Canton of Geneva, Switzer-
land: WWW Conferences Steering Committee.
Jones, R., & Diaz, F. (2007). Temporal profiles of queries. ACM Transac-
tions on Information Systems, 25, 1–13.
Kang, I.H., & Kim, G. (2003). Query type classification for web document
retrieval. In C. Clarke & G. Cormack (Eds.), Proceedings of the 26th
annual International ACM SIGIR Conference on Research and Develop-
ment in Information Retrieval (SIGIR′03) (pp. 64–71). New York, NY:
ACM.
Kulkarni, A., Teevan, J., Svore, K.M., & Dumais, S.T. (2011). Understand-
ing temporal query dynamics. In I. King (Ed.), Proceedings of the Fourth
ACM International Conference on Web Search and Data Mining
(WSDM′11) (pp. 167–176). New York, NY: ACM.
Lee, U., Liu, Z., & Cho, J. (2005). Automatic identification of user goals in
web search. In A. Ellis & T. Hagino (Eds.), Proceedings of the 14th
International Conference on World Wide Web (WWW′05) (pp. 391–
400). New York, NY: ACM.
Lee, W.M., & Sanderson, M. (2010). Analyzing URL queries. Journal of
the American Society for Information Science and Technology, 61,
2300–2310.
Lee, Y.J., Lee, J.W., Chai, D.J., Hwang, B.H., & Ryu, K.H. (2009). Mining
temporal interval relational rules from temporal data. Journal of Systems
and Software, 82, 155–167.
Lewandowski, D., Drechsler, J., & Mach, S. (2012). Deriving query intents
from web search engine queries. Journal of the American Society for
Information Science and Technology, 63, 1773–1788.
Lloyd, S. (1982). Least squares quantization in pcm. IEEE Transactions on
Information Theory, 28, 129–137.
Manning, C.D., Raghavan, P., & Schütze, H. (2008). Introduction to Infor-
mation Retrieval. New York, NY: Cambridge University Press.
McCreadie, R., Macdonald, C., & Ounis, I. (2013). News vertical search:
When and what to display to users. In G.J.F. Jones & P. Sheridan (Eds.),
Proceedings of the 36th International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR′13)
(pp. 253–262). New York, NY: ACM.
Metzler, D., Jones, R., Peng, F., & Zhang, R. (2009). Improving search
relevance for implicitly temporal queries. In J. Allan & J. Aslam (Eds.),
Proceedings of the 32nd International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR′09)
(pp. 700–701). New York, NY: ACM.
Nunes, S., Ribeiro, C., & David, G. (2008). Use of temporal expressions in
web search. In C. Macdonald, I. Ounis, V. Plachouras, I. Ruthven, &
R.W. White (Eds.), Proceedings of the IR Research, 30th European
Conference on Advances in Information Retrieval (ECIR′08) (pp. 580–
584). New York, NY: Springer.
Olston, C., & Pandey, S. (2008). Recrawl scheduling based on information
longevity. In J. Huai, R. Chen, H. Hon, & Y. Liu (Eds.), Proceedings of
the 17th International Conference on World Wide Web (WWW′08)
(pp. 437–446). New York, NY: ACM.
Ponte, J.M., & Croft, W.B. (1998). A language modeling approach to
information retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R.
Wilkinson, & J. Zobel (Eds.), Proceedings of the 21st annual interna-
tional ACM SIGIR conference on Research and development in infor-
mation retrieval (SIGIR′98) (pp. 275–281). New York, NY: ACM.
Radinsky, K., Svore, K., Dumais, S., Teevan, J., Bocharov, A., & Horvitz,
E. (2012). Modeling and predicting behavioral dynamics on the web. In
A. Mille, F. Gandon, & J. Misselis (Eds.), Proceedings of the 21st
International Conference on World Wide Web (WWW′12) (pp. 599–
608). New York, NY: ACM.
Rose, D.E., & Levinson, D. (2004). Understanding user goals in web
search. In S. Feldman & M. Uretsky (Eds.), Proceedings of the 13th
International Conference on World Wide Web (WWW′04) (pp. 13–19).
New York, NY: ACM.
Roy, R.S., Katare, R., Ganguly, N., Laxman, S., & Choudhury, M. (2015).
Discovering and understanding word level user intent in web search
queries. Web Semantics: Science, Services and Agents on the World
Wide Web, 30, 22–38.
Shokouhi, M. (2011). Detecting seasonal queries by time-series analysis. In
W. Ma & J. Nie (Eds.), Proceedings of the 34th International ACM
SIGIR Conference on Research and Development in Information
Retrieval (SIGIR′11) (pp. 1171–1172). New York, NY: ACM.
Tseng, V.S., Lin, K.W., & Chang, J.C. (2008). Prediction of user navigation
patterns by mining the temporal web usage evolution. Soft Computing,
12, 157–163.
Verberne, S., van der Heijden, M., Hinne, M., Sappelli, M., Koldijk, S.,
Hoenkamp, E., Kraaij, W., et al. (2013). Reliability and validity of query
intent assessments. Journal of the American Society for Information
Science and Technology, 64, 2224–2237.
Viera, A.J., Garrett, J.M., & Joanne, M. (2005). Understanding
interobserver agreement: The kappa statistic. Family Medicine, 37, 360–
363.
Yang, J., & Leskovec, J. (2011). Patterns of temporal variation in online
media. In I. King (Ed.), Proceedings of the Fourth ACM International
Conference on Web Search and Data Mining (WSDM′11) (pp. 177–186).
New York, NY: ACM.
16 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—•• 2015
DOI: 10.1002/asi128 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—January 2017
DOI: 10.1002/asi