Design the Search Experience

28
https://www.oreilly.com/ideas/beyond-algorithms-optimizing-the-search-experience 1

Transcript of Design the Search Experience

Page 1: Design the Search Experience

https://www.oreilly.com/ideas/beyond-algorithms-optimizing-the-search-experience

1

Page 2: Design the Search Experience

2

Page 3: Design the Search Experience

3

Page 4: Design the Search Experience

This is what users really think, or don’t when looking for information.

4

Page 5: Design the Search Experience

Hilltop Algorithm Quality of links more important than quantity of links Segmentation of corpus into broad topics Selection of authority sources within these topic areas Hilltop was one of the first to introduce the concept of machine-mediated “authority” to combat the human manipulation of results for commercial gain (using link blast services, viral distribution of misleading links. It is used by all of the search engines in some way, shape or form. The beauty of Hilltop is that unlike PageRank, it is query-specific and reinforces the relationship between the authority and the user’s query. You don’t have to be big or have a thousand links from auto parts sites to be an “authority.” Google’s 2003 Florida update, rumored to contain Hilltop reasoning, resulted in a lot of sites with extraneous links fall from their previously lofty placements as a result. Photo: Hilltop Hohenzollern Castle in Stuttgart

Page 6: Design the Search Experience

Topic Sensitive Ranking (2004) Consolidation of Hypertext Induced Topic Selection [HITS] and PageRank Pre-query calculation of factors based on subset of corpus Context of term use in document Context of term use in history of queries Context of term use by user submitting query Computes PR based on a set of representational topics [augments PR with content analysis] Topic derived from the Open Source directory Uses a set of ranking vectors: Pre-query selection of topics + at-query comparison of the similarity of query to topics

Creator now a Senior Engineer at Google

6

Page 7: Design the Search Experience

About content: quality and freshness

About agile: frequent iterations and small fixes

About UX: or so it seems (Vanessa Fox/Eric Enge: Cllick-through, Bounce Rate, Conversion)

7

Page 8: Design the Search Experience

Recently released RankBrain that uses thought vectors to revise queries

http://www.seobythesea.com/2013/09/google-hummingbird-patent/

Comparison of search query to general population search behavior around query terms

Revises query and submits both to search index Confidence score Relationship threshold Adjacent context Floating context Results a consolidation of both queries

Entity=anything that can be tagged as being associated with certain documents, e.g. Store, news source, product models, authors, artists, people, places thing. Query logs (this is why they took away KW data – do not want us to reverse engineer as we have in past) User Behavior information: user profile, access to documents seen as related to original document, amount of time on domain associated with one or more entities,

8

Page 9: Design the Search Experience

whole or partial conversions that took place

8

Page 10: Design the Search Experience

9

Page 11: Design the Search Experience

In 2002, Google acquired personalization technology Kaltix and founder Sep Kamver who has been head of Google personalization since. Defines personalization: “product that can use information given by the user to provide tailored, more individualized experience” Uses implicit (Software agents, Enhanced proxy servers, Cookies, Session IDs) and explicit (HTML forms, Explicit user feedback interaction (early Google personalization with More Like This), Provided by user with knowledge, More accurate as user shares more about query intent and interests) collection methods explicit has higher precision than implicit Query Refinement

System adds terms based on past information searches Computes similarity between query and user model Synonym replacement

Dynamic query suggestions - displayed as searcher enters query Results Re-ranking

Sorted by user model Sorted by Seen/Not Seen

Personalization of results set Calculation of information from 3 sources

User: previous search patterns Domain: countries, cultures, personalities GeoPersonalization: location-based results

Metrics used for probability modeling on future searches Active: user actions in time Passive: user toolbar information (bookmarks), desktop information (files), IP location, cookies

10

Page 12: Design the Search Experience

11

Page 13: Design the Search Experience

12

Page 14: Design the Search Experience

Reconciling Information-Seeking Behavior With Search User Interfaces for the Web (2006) Users don’t always know until the see results = gradual refinement Search refinement principles • Different interfaces (or at least different forms of interaction) should be available to match different search goals. • The interface should facilitate the selection of appropriate contexts for the search. • The interface should support the iterative nature of the search task.

13

Page 15: Design the Search Experience

http://uk.reputation.com/wp-content/uploads/2015/03/TheEvolutionofGoogleSearchResultsPagesandTheirEffectonUserBehaviour.pdf Golden triangle now vertical (Knowledge graph, local, mobile) Looking at more results in less time Spend less than 2 seconds viewing individual search results Position 2-4 getting more click activity Google eye tracking study

14

Page 16: Design the Search Experience

Real Time Search User Behavior: Jansen, Campbell, Gregg (April 2010) The most frequent query accounted for 0.003% of the query set. Less than 8% of the terms were unique. More than 44% of the queries contained one term, 30% contained two terms, and nearly 26% contained three terms or more. The average query length was 2.32 terms, which is in line with that of traditional Web search. Moving to the term level of analysis, there were 2,331,072 total terms used in all queries in the data set, with 3,477,163 total term pairs. There were 175,403 unique terms (7.5%) and 442,713 unique term pairs (12.7%), inline with Web search [3].

15

Page 17: Design the Search Experience

http://tech.digitlab.info/Mind-the-Gap-Enterprise-Search-Smartlogic-MindMetre-Sponsor-Research-Report/ Mindmetre Report November 2011 The study showed 2 minutes as the commonly acceptable time to find information Cognitive Bias = users driven to keep searching because they KNOW its there

16

Page 18: Design the Search Experience

Miles Kehoe did a great post on LinkedIn with specifics if you need them https://www.linkedin.com/pulse/solving-problems-enterprise-search-miles-kehoe?trk=hb_ntf_MEGAPHONE_ARTICLE_POST Perform and audit Get data Test security

17

Page 19: Design the Search Experience

Image courtesy of https://almanac2010.wordpress.com/spiritual-new-supernatural/ and “What’s So Funny About Science” Sidney Harris (1977)

18

Page 20: Design the Search Experience

Daniel Tunkelang: Director of Engineering, Search Linkedin, Tech Lead Local Search Google, Chief Scientist ENdeca • Communicate with Users • Entity detection is crucial • Queries vary in difficulty. Recognize and adapt.

19

Page 21: Design the Search Experience

How Many Results Per Page? A Study of SERP Size, Search Behavior and User Experience; Kelly & Assopardi (i) trust bias, where users trust the search engine to deliver the most relevant item first, i.e., following the probability ranking principle [27], and (ii) quality bias, where the behavior depends on the quality of the retrieval system. They concluded users are more likely to click on highly ranked documents and that quality influences click behavior, such that if the relevance of the items retrieved decreases, users click on items that are less relevant, on average.

20

Page 22: Design the Search Experience

Optimizing Enterprise Search by Automatically Relating User Context to Textual Document Content Reischold, Kerschbaumer,Fliedl Similar roles = similar searches Role term vector for role rank (use limited number of user profiles to build)

21

Page 23: Design the Search Experience

Is Enterprise Search Useful At All? Lessons Learned From Studying User Behavior: Stocker, Zoier, et.al

22

Page 24: Design the Search Experience

23

Page 25: Design the Search Experience

How Many Results Per Page? A Study of SERP Size, Search Behavior and User Experience; Kelly & Assopardi

24

Page 26: Design the Search Experience

Cost and Benefit Analysis of Mediated Enterprise Search: Wu, Thom et.al Time Savings Times Salary (TSTS) methodology is most suitable for assessing the direct labor cost and benefit - time saved/worker value (salary) used to calculate ROI Findings: Our case study has shown that the insurance company would get substantial benefit by investing in relevance judgments. Since the cost for assessing a query is fixed, the more a query is searched, the more benefit the company would gain.

25

Page 27: Design the Search Experience

26

Page 28: Design the Search Experience

27