Otis Gospondnetic,Sematext @ Lucene Revolution 2011

31
Copyright 2011 Sematext Int'l. All rights reserved. 1 Search Analytics What? Why? How? Otis Gospodnetić Sematext International

description

Search Analytics What? Why? How?

Transcript of Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Page 1: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.1

Search Analytics

What? Why? How?

Otis Gospodnetić – Sematext International

Page 2: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.2

About Otis Gospodnetić

• Member: Apache Lucene, Solr, Nutch, Mahout

• Author: Lucene in Action 1 & 2

• Entrepreneur: Sematext, Simpy

Page 3: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.3

About Sematext

Consulting, development, support:

Search (Lucene, Solr, Elastic Search...) Big Data (Hadoop, HBase, Voldemort...) Web Crawling (Nutch) Machine Learning (Mahout)

Page 4: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.4

Agenda Intro: Otis & Sematext - DONE What Why Reports

Page 5: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.5

What is Search Analytics? Input: queries and clicks Output: reports – over time Next: actions

The means, not the goal Ongoing, not one-off

Page 6: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.6

Search Analytics and SEO Not the same SA can help with SEO

Page 7: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.7

Search vs. Web Analytics User intent and information needs vs. inferring Hand in hand Ideally you can relate data from both or even

unify it

Page 8: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.8

Why Search Analytics? Measure and monitor everything Supports (re)design, navigation choices Helps with content acquisition & enhancement Improve search experience Mula

Page 9: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.9

Report Groups Failures vs. non-failures Actionable vs. non-actionable

Page 10: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.10

Failures

Be aware of failures, but don't be one.

Zero hits Low query CTR High search exit rate Irrelevant results Over N refinements

Page 11: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.11

Report: Zero Hit Queries Popular queries; percentage, not raw count Misspellings? Synonyms? No matching content? Need (different) tagging? Bad analysis? Multilingual issue?

Page 12: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.12

Report: Zero Hit Queries (cont.) Use Query Spellchecker (aka DYM) Using AutoComplete Using DYM ReSearcher Designing No Results page

Page 13: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.13

Report: High Exit Rate Queries Disappointed, frustrated users Major revenue loss, no second chance Marriage with Web Analytics Relevance bad? Default ordering bad? Titles of hits need adjusting? Search terms highlighting looking bad? Bad thumbnails? Need thumbnails?

Page 14: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.14

Report: Irrelevant Result Queries Manual: top N hits of top N queries Judge relevance of each hit and assign score Per-query score: sum scores of top N hits Cumulative top N query score: sum per-query

scores

Automated: Mean Reciprocal Rank (MRR)

Page 15: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.15

Report: Total Queries Search vs. navigation/browsing Search vs. overall site usage Related report: % of visits with search Segment: new users vs. return users, etc.

Questions: do you count paging? Facet selection? Re-sorting?

Page 16: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.16

Report: Total Distinct Queries What's distinct? Car vs. Cars # Total Queries / # Distinct Queries = Avg. # Tied to performance and query cache utilization Extension: Total distinct words in queries

Page 17: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.17

Report: Words Per Query Informative, slowly changing, not terribly

actionable Can affect search box size Use AutoComplete if queries are long

Page 18: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.18

Report: Top Queries User intent and information needs Ensure good results Calculate MRR for top N queries Calculate MRR for each top N query Compare to global MRR

Page 19: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.19

Report: Top Queries (cont.) New top queries – new trend? New demand? Best Bets (aka Query Elevation in Solr) Expose before search is needed Seasonality – hour of day, day of the week, etc.

Adjust content presentation and availability (e.g. week vs. weekend, business vs. personal)

Anticipate demand in the next cycle

Page 20: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.20

Clickstream Analysis Query analysis is not a complete story:

Queries Clicks (Trans)action

Page 21: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.21

Query and Hit Valuation Query: by popularity (count) Query: by CTR Query: by subsequent (trans)action count/pct.

Hit: by click count Hit: by subsequent (trans)action count/pct.

Page 22: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.22

Query and Hit Valuation (cont.) Maximize:

pop(query) + ctr(query) + action(query)clicks(hit) + action(hit)

Failures:high pop(q), yet low ctr(q)

high pop(q), high ctr(q), yet low action(q)

Integration with backend required

Page 23: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.23

Report: Low CTR Queries Popular queries; percentage, not raw count Relevance bad? Default ordering bad? Titles of hits need adjusting? Search terms highlighting looking bad? Bad thumbnails? Need thumbnails?

Page 24: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.24

Report: Queries with Most Clicks i.e. Queries with Highest CTR Informative? Yes Actionable? Somewhat: expose relevant

content outside of search

Page 25: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.25

Search Session Search activity aimed at satisfying a specific

information need in a some limited amount of time.

i.e. it's very fuzzy

Page 26: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.26

Interesting Search Sessions More than N queries in M minutes Sessions that end in a failure Sessions for specific type of info (e.g. person

name, product name, event)

Page 27: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.27

Segmentation Searches that resulted in conversion vs. not Search metrics for New vs. returning visitors English vs. French vs. Spanish vs. … Chrome vs. IE ...

Page 28: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.28

More SA Reports/Questions % of queries from DYM vs. AC vs. typed Most common queries per clicked hit Which hits are generally popular? Which hits are trending up? Are there docs that are never ever clicked on? Average number of queries per session Breakdown of queries by number of hits

Page 29: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.29

More SA Reports/Questions Breakdown of queries by latency Frequently used facets or sort criteria Avg number of clicks per query Time spent on site before/after searching Search initiation pages How deep into SERPs are people drilling? Are too many clicks on pages other than 1st? ...

Page 30: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.30

Data Collection

Page 31: Otis Gospondnetic,Sematext @ Lucene Revolution 2011

Copyright 2011 Sematext Int'l. All rights reserved.31

• sematext.com• blog.sematext.com• @sematext• @otisg• [email protected]

Contact