Scott Triglia, MLconf 2013

31
Starting Recommendations Scott Triglia

description

Scott Triglia, Search and Data Mining Engineer at Yelp

Transcript of Scott Triglia, MLconf 2013

Page 1: Scott Triglia, MLconf 2013

Starting Recommendations

Scott Triglia

Page 2: Scott Triglia, MLconf 2013

Goals

● Show our thought process

● Expose some useful questions

● Practical solutions for new teams

Page 3: Scott Triglia, MLconf 2013

Disclaimer

This is a case study, not a prescription

Page 4: Scott Triglia, MLconf 2013

Yelp 101

Page 5: Scott Triglia, MLconf 2013

Our Topic

Our Goal: Interesting businesses relevant to you right now

Page 6: Scott Triglia, MLconf 2013

Before

Page 7: Scott Triglia, MLconf 2013

Decision Time

Brainstorming what matters with your ace team of devs

Page 8: Scott Triglia, MLconf 2013

Context matters

Page 9: Scott Triglia, MLconf 2013

Context matters

Page 10: Scott Triglia, MLconf 2013

Don’t be shy

Interesting reasons are half the point!

Page 11: Scott Triglia, MLconf 2013

Know your Team

The organizational context matters too:

● We have very little infrastructure to support large-scale ML

● Must scale to all Yelp data+users on day 1.

● Our team is small, and will be so for a while

● This is a first version of a (hopefully!) long lived product

Page 12: Scott Triglia, MLconf 2013

So what do we build?

Page 13: Scott Triglia, MLconf 2013

Guiding Principles

1) We know we need to solve a retrieval problem

Page 14: Scott Triglia, MLconf 2013

Guiding Principles

2) Build for what you have, plan for expansion

Page 15: Scott Triglia, MLconf 2013

Guiding Principles

3) We need to build a good product, not beat a benchmark

Page 16: Scott Triglia, MLconf 2013

The Big Picture

API Request

ExpertsExpertsExpertsFinal

Results

Elastic Search

General flow:

1. Gather sufficient contextual information

2. Consult each expert for their top candidates

3. Wisely combine suggestions from each expert

Page 17: Scott Triglia, MLconf 2013

Building the request

API Request

ExpertsExpertsExpertsFinal

Results

Elastic Search

● From client: location, user_id

● Derived context

● Neighborhood preferences

● User preferences

● Time preferences

Page 18: Scott Triglia, MLconf 2013

Expert Opinions

API Request

ExpertsExpertsExpertsFinal

Results

Elastic Search

Each expert handles a single reason and knows its own requirements.

For example, a LikedByFriends expert would only return candidate businesses which one of the user’s friends had rated highly.

Page 19: Scott Triglia, MLconf 2013

Expert Opinions

API Request

ExpertsExpertsExpertsFinal

Results

Elastic Search

Liked By Friends Expert:General Requirements:

Open NowSufficiently Nearby

Expert Requirements:At least one friend gave it 5 stars

Page 20: Scott Triglia, MLconf 2013

Expert Opinions

API Request

ExpertsExpertsExpertsFinal

Results

Elastic Search

Why an expert-based system?

● Think in terms of small, isolated components

● Implementation agnostic

● Adding, removing experts is trivial

Page 21: Scott Triglia, MLconf 2013

Efficient Search

API Request

ExpertsExpertsExpertsFinal

Results

Elastic Search

What do we need from our datastore?

● Fast geographic filtering

● Simple but efficient sorting

● All of this happening in 100ms

Page 22: Scott Triglia, MLconf 2013

Final Decisions

API Request

ExpertsExpertsExpertsFinal

Results

Elastic Search

How to combine expert results? We need to factor in:

● Must balance preferences (distance, rating, category)

● Should prefer better reasons when possible

● Sufficiently high quality candidates makes this very safe

Page 23: Scott Triglia, MLconf 2013

Get to the point already!

Page 24: Scott Triglia, MLconf 2013

Get to the point already!

Page 25: Scott Triglia, MLconf 2013

Get to the point already!

Page 26: Scott Triglia, MLconf 2013

Get to the point already!

Page 27: Scott Triglia, MLconf 2013

Get to the point already!

Page 28: Scott Triglia, MLconf 2013

Final Version

Page 29: Scott Triglia, MLconf 2013

Extension

Now that we’re iterating, what are our future plans?

● Richer context (user, location, etc.)

● Infrastructure support for faster ML prototyping

● Better personalized ranking

● Training data!

Page 30: Scott Triglia, MLconf 2013

Summary

So what are the takeaways for building a first recommender system?

● Solve your problem, not someone else’s

● Being cutting edge may not be the top priority

● Build for the tools you have, plan for what will come

● Good software engineering enables quality ML