Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson,...

O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X

Search Accuracy Metrics & Predictive Analytics A Big Data Use Case

Paul Nelson Chief Architect, Search Technologies

pnelson@searchtechnologies.com

There will be a demo (so don’t go away)

185+ Consultants Worldwide

San Diego

London, UK

San Jose, CR

Cincinna>

Prague, CZ

Washington (HQ)

Frankfurt, DE

• Founded 2005 • Deep search expertise

• 700+ customers worldwide • Consistent profitability

• Search engines & Big Data • Vendor independent

Typical Conversation with Customer

Our searchaccuracyis bad

How bad?Really,really,bad.

Uh… on ascale of 1 to 10,

how bad?

An eight.No wait…

a nine.Maybe even

a 9.5.Let’s call it

a 9.23

Current methods are woefully inadequate

•  Golden Query Set o  Key Documents

•  Top 100 / Top 1000 Queries Analysis

•  Zero result queries

•  Abandonment rate

•  Queries with click

•  Conversion

What are we trying to achieve? •  Reliable metrics for search accuracy •  Can run analysis off-line

o  Does not require production deployment (!)

•  Can accurately compare two engines •  Runs quickly = agility = high quality •  Can handle different user types / personalization

o  Broad coverage

•  Provides lots of data to analyze what’s going on o  Data to decide how best to improve the engine

Search Engine Under Evalua1on

Leverage logs for accuracy testing

Query Logs

Click Logs

Big Data Framework

• Engine Score(s) • Other metrics & histograms • Scoring database

Search Engine Under Evalua1on

From Queries à Users

•  User by User Metrics o  Change in focus

•  Group activity by session and/or user o  Call this an “Activity Set” o  Merge sessions and users

•  Use Big Data to analyze all users o  There are no stupid queries and no stupid users o  Overall performance based on the experience of the users

Queries

Other Ac>vity

Clicks

Clusters

Engine Score •  Group activity by session and/or user (Queries & Clicks) •  Determine “relevant” documents

o  What did the user view? Add to cart? Purchase? o  Did the search engine return what the user ultimately wanted?

•  Determine engine score per query based on user’s POV o  Σ power(FACTOR, position)*isRelevant[user, searchResult[position].DocID] o  (Note: many other formulae possible, MRR, MAP, DCG, etc.)

•  Average score for all user queries = user score

•  Average scores across all users = final engine score

The FACTOR (K)

Off-Line Engine Analysis

o  Can we re-compute this array for all queries? o  ANSWER: Yes!

Σ power(FACTOR, position)*isRelevant[User, searchResult[position].DocID]

Offline Re-‐Query

Search Engine Query Logs New

Results

Big Data Array Search Engine (possibly embedded)

Continuous Improvement Cycle

Modify Engine

Execute Queries

Compute Engine Score

Evaluate Results

Log Files

Search Engine

Search

Score Per Engine Version

Watch the Score Improve Over Time

What else can we do with Engine Scoring?

Predictive Analytics

The Brutal Truth about Search Engine Scores

•  Random ad-hoc formulae put together o  No statistical or mathematical foundation

•  TF / IDF à All kinds of inappropriate biases o  Bias towards document size (smaller / larger) o  Bias towards rare (misspelled? archaic?) words o  Not scalable (different scores on different shards)

•  Same formula since the 1970’s

They are not based on science.

We can do beKer!

Big Data Cluster

We use Big Data to Predict Relevancy Search Engine Content

Sources

Connectors Index Search Index

Search Project Docs

Web Site Pages

Support Pages

Landing Pages

Content Processing

Content Copy Search Click Logs Click Logs

Query Logs

Financial Data

Business Data

Query Logs

RelevancyModel

Probability Scoring / Predictive Relevancy

clicked?

purchased?

0 01 11 00 01 01 1

Predic1ve Analy1cs Sta1s1cal Model to Predict Probability

Product Signals

Query Signals

User Signals

Comparison Signals

The Power of the Probability Score •  The score predicts probability of relevancy •  Value is 0 à 1

o  Can be used for threshold processing o  All documents too weak? Try something else! o  Can combine results from different sources / constructions together

•  Identifies what’s important o  Machine learning optimizes for parameters

-‐  Identifies the impact and contribution of every parameter o  If a parameter does not improve relevancy à REMOVE IT o  Scoring becomes objective, not subjective (now based on SCIENCE) o  Allows for experimentation on parameters

And now the demo! (just like I promised)

Come out of the darkness

And into the Light!

The Age of Enlightenment for search engine accuracy

is upon us!

Search Accuracy Metrics & Predictive Analytics A Big Data Use Case

Paul Nelson Chief Architect, Search Technologies

pnelson@searchtechnologies.com

Thank you!

Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson,...

Technology

Transcript of Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson,...

Predictive Parallelization: Taming Tail Latencies in Web Search

Enhancing Genetic Programming for Predictive Modeling689868/FULLTEXT01.pdf · 8.4 Techniques for enhancing accuracy of predictive models ..... 194 8.4.1 Handling large search spaces

Predictive Targeting: How Major Social Networks are Customizing the Search Experience

Establishment of a foundation for predictive design analysis …lup.lub.lu.se/search/ws/files/6168484/1024356.pdf · ESTABLISHMENT OF A FOUNDATION FOR PREDICTIVE DESIGN ANALYSIS WITHIN

Predictive Maintenance & Service and Predictive Quality ... · Predictive Maintenance & Service and Predictive Quality– ... Predictive Maintenance is an Important Building Block

Collecting Samples Chapter 2.3 – In Search of Good Data Mathematics of Data Management (Nelson) MDM 4U.

S.8 Nelson, R. & Winter, S. _1977_. in Search of a Useful Theory of Innovation. Pág 36-76

Predictive Location Search using Hidden Markov Model and Outlier Detection

Predictive Indexing for Fast Search · For each query set, the associated predictive index is an ordered list of web pages sorted by their expected score for random queries drawn

Predictive Entropy Search for Multi-objective Bayesian … · 2016. 2. 23. · Predictive Entropy Search for Multi-objective Bayesian Optimization Daniel Hern´andez-Lobato Universidad

Axure Predictive Search Demo

Genetic Algorithm Search for Predictive Patterns in ...wpmedia.wolfram.com/uploads/sites/13/2018/02/19-3-1.pdfFeb 19, 2018 · samples retain statistically significant predictive

Transparent Predictive Coding User Guide 8 · Predictive Analytics with Search and Review 65 ... the Transparent Predictive Coding User Guide is intended to guide you through the

Predictive Search for SMX Israel By Barry Schwartz

The Relationship between Search Based Software …crest.cs.ucl.ac.uk/fileadmin/crest/sebasepaper/Harman10.pdf · The Relationship between Search Based Software Engineering and Predictive

EYEWORKSINDUSTRIES | NOVEMBER 2017 SUNGLASS TRENDS ...€¦ · am eyewear am eyewear am eyewear bailey nelson bailey nelson bailey nelson bailey nelson bailey nelson bailey nelson

PREDICTIVE ANALYTICS TRAINING - Healthix · Predictive Analytics Service, when you search for a patient in the Healthix Portal, you will be brought to the patient summary screen that

NELSON AMETHYST COL99-FR NELSON BLACK-FR ...NELSON AMETHYST COL99-FR NELSON BLACK-FR NELSON BURGUNDY COL47-FR NELSON CHARCOAL-FR NELSON CHOCOLATE COL73-FR NELSON CREAM …

Predictive 3D Search Algorithm for Multi-Frame Motion Estimation

Making Retail Paid Search Predictive: New ways Retailers like OTC are using paid search to drive profitable revenue