Query Log Analysis
description
Transcript of Query Log Analysis
Query Log Analysis
Naama Kraus
Slides are based on the papers:Andrei Broder, A taxonomy of web searchRicardo Baeza-Yates, Graphs from Search Engine QueriesHassan, Jones, Klinkner, Beyond DCG: User Behavior as a Predictor of a Successful Search
A Taxonomy of Web Searches
• [Andrei Broder] classifies web queries according to their intent:– Navigational - reach a particular site
• Example: cnn , Oracle– Informational - acquire some information
• Example: the history of haifa , information retrieval– Transactional - perform some web-mediated
activity. Further interaction is expected.• E.g. shopping, downloading files, accessing databases• Example: new balance shoes , Israel flights
Query Log
• Search Engine Query Log records users’ searches
• A typical record contains– Anonymous User id u– Search query q– Returned documents V– Clicked documents C– Timestamp t
Query Log Example
1234 , apple, 12:041234, apple ipod, 12:051234 ynet, 12:13145 google, 12:20145 eBay, 12:5632 ynet news, 12:59145 Solaris systen, 13:01145 Solaris system, 13:05…
Session
• A sequence of searches of one particular user u within a specific time limit
• S = < <u, q1 ,t1> , …, <u, qk, tk> >• t1 < …< tk (=> ordered sequence)• ti+1 – ti < t0 (=> t0 is a timeout threshold)
• Note1 may contain non related queries• Note2 identifying sessions is easy
Session Example
• 1234 , apple, 12:04• 1234, apple ipod, 12:05• 1234 ynet, 12:13• 1234 apple store, 12:20• 1234 cnn news, 12:56• 1234 cnn webcast,
12:59• 1234 apple apps, 13:01
• Session 1• Session 2• Timeout threshold = 30
minutes
Query Chain
• A sequence of queries with a similar information need of a particular user– Also known as mission or logical session
• Example: haifa maps haifa travel attractions in haifa
• Note1 contains related queries only• Note2 identifying chains is difficult
Query Chain Example
• 1234 , apple, 12:04• 1234, apple ipod, 12:05• 1234 ynet, 12:13• 1234 apple store, 12:20• 1234 cnn news, 12:56• 1234 cnn webcast,
12:59• 1234 apple apps, 13:01
• chain1• chain2
Click Graph
Bipartite graphNodes in left side are unique queriesNodes in right side are unique URLs
An edge between q,u if there existsin the log a click on u for query q
Edges may be weighted according tonumber of clicks
This graph is used by numerousAlgorithm for various purposesE.g., query and URL clustering,query recommendations …
Query Graphs
Each unique query isa node in the graph
Next slides – Connection types between queries(edges)
Proposed by[Ricardo Baeza-Yates]
Query Graphs – Word Graph
An edge between nodesexists, if queries sharecommon terms
Possible node weight –Number of occurrencesin the log
Possible edge weight -Jaccard distance
paris hotels
cheap paris hotels
paris attractions
london attractions
Query Graphs – Session Graph
Node’s q weight is the number ofsessions that contain the query q (usually equalsnumber of query occurrences)
A directed edge from q1 to q2if q1 occurred before q2 in the same session
Edge’s weight is numberof such occurrences
paris hotels
paris attractions
cheap paris hotels
london attractions
Query Graphs – URL Cover Graph
paris hotels
paris attractions
cheap paris hotels
london attractions
An edge exists between q1and q2, if they share clicked URLs
Node weight = #occurrences
Edge’s weight is the number ofcommon clicks
Query Graph – URL Link Graph
paris hotels
paris attractions
cheap paris hotels
london attractions
An edge exists between q1and q2, if there is at least one link between a url click of q1 and a url click of q2
Node weight =#occurrences
Edge’s weight is the numberof such common links
Query Graph –URL Terms Graph
paris hotels
paris attractions
cheap paris hotels
london attractions
Represent a clicked URL bya set of terms(whole page, snippet, anchors, title, a combination …)
Weight terms by their frequencies
Node weight =#occurrences
There’s an edge between q1 andq2 if there are at least m commonterms in at least one clickedurl of q1 and one clicked url of q2
Edge weight is sum of frequenciesof common terms
User Behavior as a Predictor of a Successful Search
• Goal: given a sequence of user actions within a specific logical session, predict whether the search goal ended up successfully or not– Success – user is satisfied with the results– Failure – user is unsatisfied
• Method: – Analyze the query log and learn success/failure
patterns– Use learned models for prediction
• Proposed by [Hassan, Jones and Klinkner]
Data
• A rich query log of queries and user actions:– Query (Q)– Search Click (SR)– Sponsored Search Click (AD)– Related Search Click (RL)
• Query recommendations– Spelling Suggestion Click (SP)– Shortcut Click (SC)
• E.g. image, video, news …– Any Other Click (OTH)
• E.g. browser tab
Data Labeling
• Random sample of user sessions
• Human editors labeled data:– Detected logical sessions– Success/Failure
• definitely successful, probably successful, unsure, probably unsuccessful, and definitely unsuccessful
Markov Models
• Partition training data into two splits– successful goals– unsuccessful goals
• For each group construct a Markov Model derived from seen action sequences– A Model describes the user behavior in case of a
successful/unsuccessful search goal– Action type is a state– Weight a transition from one state to another
according to its probability as observed in the data
(MLE)
Transition Weighting - MLE
,
Pr ,
, :
:
i j
i
i j
i
S SMLE i j
S
S S
i j
S
i
N NS S
N
N N
Number of times we sawa transition fromS to S
N
Number of times we sawtransition S
Illustration
START
Q SR
END
ADRL
1
0.3 0.1
0.6
0.1
0.4
0.5
1 1
Prediction (1)• Given a user’s action sequence, need to
predict whether it is successful or not• We’ve learned two models Ms and Mf of
successful and unsuccessful patterns• Compute the probability that a given
sequence S={S1,…,Sn} was generated from Ms, same for Mf
• Predict success/non success by computing log likelihood– Formulas in next slide
Prediction (2)
Formulas taken from the paper