Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson,...

24
OCTOBER 13-16, 2016 AUSTIN, TX

Transcript of Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson,...

Page 1: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X

Page 2: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

Search Accuracy Metrics & Predictive Analytics A Big Data Use Case

Paul Nelson Chief Architect, Search Technologies

[email protected]

Page 3: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

3

There will be a demo (so don’t go away)

Page 4: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

4

185+  Consultants  Worldwide  

San  Diego  

London,  UK  

San  Jose,  CR  

Cincinna>  

Prague,  CZ  

Washington  (HQ)  

Frankfurt,  DE  

• Founded 2005 • Deep search expertise

• 700+ customers worldwide • Consistent profitability

• Search engines & Big Data • Vendor independent

Page 5: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

5

Typical Conversation with Customer

Our searchaccuracyis bad

How bad?Really,really,bad.

Uh… on ascale of 1 to 10,

how bad?

An eight.No wait…

a nine.Maybe even

a 9.5.Let’s call it

a 9.23

Page 6: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

6

Current methods are woefully inadequate

•  Golden Query Set o  Key Documents

•  Top 100 / Top 1000 Queries Analysis

•  Zero result queries

•  Abandonment rate

•  Queries with click

•  Conversion

Page 7: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

7

What are we trying to achieve? •  Reliable metrics for search accuracy •  Can run analysis off-line

o  Does not require production deployment (!)

•  Can accurately compare two engines •  Runs quickly = agility = high quality •  Can handle different user types / personalization

o  Broad coverage

•  Provides lots of data to analyze what’s going on o  Data to decide how best to improve the engine

Page 8: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

Search  Engine  Under  Evalua1on  

Search  Engine  Under  Evalua1on  

Search  Engine  Under  Evalua1on  

8

Leverage logs for accuracy testing

Query  Logs  

Click  Logs  

Big  Data  Framework  

• Engine  Score(s)  • Other  metrics  &  histograms  • Scoring  database  

Search  Engine  Under  Evalua1on  

Page 9: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

9

From Queries à Users

•  User by User Metrics o  Change in focus

•  Group activity by session and/or user o  Call this an “Activity Set” o  Merge sessions and users

•  Use Big Data to analyze all users o  There are no stupid queries and no stupid users o  Overall performance based on the experience of the users

Queries  

Other  Ac>vity  

Clicks  

Clusters  

User  

Page 10: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

10

Engine Score •  Group activity by session and/or user (Queries & Clicks) •  Determine “relevant” documents

o  What did the user view? Add to cart? Purchase? o  Did the search engine return what the user ultimately wanted?

•  Determine engine score per query based on user’s POV o  Σ power(FACTOR, position)*isRelevant[user, searchResult[position].DocID] o  (Note: many other formulae possible, MRR, MAP, DCG, etc.)

•  Average score for all user queries = user score

•  Average scores across all users = final engine score

Page 11: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

11

The FACTOR (K)

Page 12: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

12

Off-Line Engine Analysis

o  Can we re-compute this array for all queries? o  ANSWER: Yes!

Σ power(FACTOR, position)*isRelevant[User, searchResult[position].DocID]

Offline  Re-­‐Query  

Search  Engine  Query  Logs   New  

Results  

Big  Data  Array   Search  Engine  (possibly  embedded)  

Page 13: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

13

Continuous Improvement Cycle

Modify  Engine  

Execute  Queries  

Compute  Engine  Score  

Evaluate  Results  

Log  Files  

Search  Engine  

Search

Score  Per  Engine  Version  

Page 14: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

14

Watch the Score Improve Over Time

Page 15: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

15

What else can we do with Engine Scoring?

Predictive Analytics

Page 16: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

16

The Brutal Truth about Search Engine Scores

•  Random ad-hoc formulae put together o  No statistical or mathematical foundation

•  TF / IDF à All kinds of inappropriate biases o  Bias towards document size (smaller / larger) o  Bias towards rare (misspelled? archaic?) words o  Not scalable (different scores on different shards)

•  Same formula since the 1970’s

They  are  not  based  on  science.  

We  can  do  beKer!  

Page 17: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

 Big  Data  Cluster  

17

We use Big Data to Predict Relevancy Search  Engine  Content  

Sources  

Connectors Index Search  Index  

Search Project  Docs  

Web  Site  Pages  

Support  Pages  

Landing  Pages  

Content Processing

Content  Copy   Search  Click  Logs  Click  Logs  

Query  Logs  

Financial  Data  

Business  Data  

Query  Logs  

Op

RelevancyModel

Page 18: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

18

Probability Scoring / Predictive Relevancy

clicked?

purchased?

0 01 11 00 01 01 1

Predic1ve  Analy1cs   Sta1s1cal  Model  to  Predict  Probability  

Product  Signals  

Query  Signals  

User  Signals  

Comparison  Signals  

Page 19: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

19

The Power of the Probability Score •  The score predicts probability of relevancy •  Value is 0 à 1

o  Can be used for threshold processing o  All documents too weak? Try something else! o  Can combine results from different sources / constructions together

•  Identifies what’s important o  Machine learning optimizes for parameters

-­‐  Identifies the impact and contribution of every parameter o  If a parameter does not improve relevancy à REMOVE IT o  Scoring becomes objective, not subjective (now based on SCIENCE) o  Allows for experimentation on parameters

Page 20: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

20

And now the demo! (just like I promised)

Page 21: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

Come out of the darkness

Page 22: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

And into the Light!

Page 23: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

The Age of Enlightenment for search engine accuracy

is upon us!

Page 24: Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Presented Paul Nelson, Search Technologies

Search Accuracy Metrics & Predictive Analytics A Big Data Use Case

Paul Nelson Chief Architect, Search Technologies

[email protected]

Thank you!