Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

80
Before the Model: How Machine Learning Products Start Elena Grewal / November 11, 2016 / @elenatej

Transcript of Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Page 1: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Before the Model:How Machine Learning Products Start

Elena Grewal / November 11, 2016 / @elenatej

Page 2: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Machine Learning Products @ Airbnb

● Two sided marketplace: Each guest and host are unique.

● ML at its core is around personalization and we use it in all aspects of our product.

● Teams which have ML products: host growth, guest growth, search, pricing, customer support, many more.

Page 3: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Machine Learning at all steps of using Airbnb

Page 4: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Lifecycle of a Machine Learning Product

Sizing Opportunity and

Scope

Model Architecture

Data Pipelines and Processing

Model Optimization

Production Implementation

& Evaluation

Page 5: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Initial formulation of the problem is key to success

Sizing Opportunity and

Scope

Data Pipelines and Processing

Model Optimization

Production Implementation

& Evaluation

Model Architecture

Page 6: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

You need to have the right target metric(s)

Page 7: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Pricing

Page 8: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 9: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Way back in 2014 we did an offsite

Question: “What do you think is the highest impact project our team can undertake in the next year?”

Answer: “Pricing”

(we also ate pizza in a baller Airbnb home)

Page 10: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Step 1: Make the Case for Working on Pricing

- Highlight all the ways that prices matter - The impact of price on booking + rebooking- Price filter usage- Variations by market

50 slide deck presented to executives

Buy time! A project like this takes ~6 months to see any results

Page 11: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Step 1: Make the Case for Working on Pricing

Page 12: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Step 2: Model Architecture - Before

Current model predicted price using nearby Airbnb homes

- Location, Listing characteristics, Recency

This mimicked host behavior

Page 13: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Step 2: Model Architecture - After

New metric: Bookings

Price suggestion based on probability of booked on given day

- Much more flexible- Prices for each date- Interesting UX opportunities

Added model layer for adoption of prices. Team of 15 on it now!

Page 14: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Learnings

● Target metric = business outcome (NOT the precision/recall of your model)

● Up front analysis of potential impact of ML product achieves the buy in to work on a project for the needed time

○ More important - you have a better idea of whether it’s the right thing to work on

● User behavior should be considered in model architecture

Make time for thinking about machine learning products.

Page 15: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Search

Page 16: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 17: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Ranking model could optimize for ‘click through’

But those might not be the right fit for the trip at hand

Page 18: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Ranking model could optimize for guest ‘contact’

But what if the guest is rejected?

Page 19: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Solution: Optimize for a combination of outcomes

Machine Learned ranker, using Gradient Boosted Model (GBM)

Page 20: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Learnings

● Target metric = business outcome

○ Traditional target metrics don’t always apply

● Think carefully about the value of different potential business outcomes - solution may be a combination of outcomes

Page 21: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Business Travel

Page 22: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

How did it start

We noticed that we didn’t have as many business travelers

Hypothesis: business travelers have different needs than leisure travelers

Can we design products specifically for business travelers?

Page 23: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 24: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Step 1: Size the Opportunity

Problem: We didn’t know who was a business traveler and who wasn’t.

To personalize, we needed to show segments had meaningful differences

Collected initial label from 1%

Page 25: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Step 2: Model architecture

In this case, our goal was to target business travelers with customized content to increase business travel penetration

Simple model, where we predicted if you were a business traveler or not.

Page 26: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Learnings

● Start with hypothesis

● Collect labeled data

● Build a simple product to start - see how it works

Page 27: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Machine Learning Infrastructure

Page 28: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Prior state of the world

- Teams develop multiple ML infrastructure with different versions of features

- ML in production requires engineering expertise- While many teams are using ML the process is painful

Meta before the model

Page 29: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Step 1: Sizing the opportunity & scope

1. Generate ideas for adding 65 new ML products -> multiplier opportunity for building shareable components

2. ‘Back of the envelope’ potential impact on metrics3. Team proposal with clear deliverables

i. # of users participating in MLii. Reduced time and effort to build ML productsiii. Enable easy model eval

Feature Discovery

Data Acquisition

Feature Engineering

Model Training Model Scoring

Page 30: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Step 2: In progress!

We have added support for Tensorflow and are now supporting a couple models in production with new infra

Interesting challenges: how to represent a listing in an extensible way - what features will apply to many different models?

This is where we are going in the future.

Page 31: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Step 2: In progress!

- Added support for TensorFlow (enabling deep learning at scale)

- Interesting challenges: how to represent a listing in an extensible way - what features will apply to many different models?

- This is where we are going in the future

images

text

Categorical attributes

Page 32: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Guiding principles

Target metric Analyze user behavior Architect Model

Opportunity for personalization, impact on metric, user interaction with ML product UX

Set up is the most important part.Start simple and iterate.

Focus on moving a business metric with ML product

Page 33: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 34: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Appendix

Page 35: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Life cycle of a machine learning product

● Opportunity and Scope: Tailoring a data product solution to a business problem (e.g. scoping optimizing improved pricing recommendation model as a solution to hosts setting the right price)

● Model Architecture: Figuring out high-level labels, feature choice and modeling approach

● Data pipelines/processing: Process raw data to features and labels.

● Model implementation: Building v1 of the model - typically done at scale and setting up infrastructure is needed - can be easy with off the shelf packages but harder if bigger ones

● Model optimization:

○ Offline evaluation: Where does the model fall?

○ Model performance: Optimize model to improve overall predictive power to resolve fail points (feature transformation, regularisation, etc)

● Productionizing: Scoring model (online or offline), piping features to model, piping scores to production.

● Online Evaluation: experimentation

Page 36: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

For this talk● Opportunity and Scope: Tailoring a data product solution to a business problem (e.g. scoping optimizing

improved pricing recommendation model as a solution to hosts setting the right price)

● Model Architecture: Figuring out high-level labels, feature choice and modeling approach

● Data pipelines/processing: Process raw data to features and labels.

● Model implementation: Building v1 of the model - typically done at scale and setting up infrastructure is needed - can be easy with off the shelf packages but harder if bigger ones

● Model optimization:

○ Offline evaluation: Where does the model fall?

○ Model performance: Optimize model to improve overall predictive power to resolve fail points (feature transformation, regularisation, etc)

● Productionizing: Scoring model (online or offline), piping features to model, piping scores to production.

● Online Evaluation: experiment!

Creating the kaggle competition

Page 37: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Why do we care about this● You can have a great modle optimizing it perfectly but if the framing isn’t right it doesn’t matter

● This is often the most important part of buildling a machine learning product.

● Going to go over a few examples now of where this goes wrong

○ You don’t have the right business problem

○ You aren’t thinking about the way users adopt

○ You don’t know the size of the impact / when to personalize

Page 38: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Ways a ML product can begin● Structured: You have a metric you’d like to improve - you think of a machine learning product that could

help

● Unstructured: You’re playing around with new data, you have some ideas - brainstorm etc

A company that builds successful ML products will create incentives and space for innovation in both instances

Page 39: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Importance of a metric● For any machine learning challenge you need to have a metric that you are optimizing against.

Otherwise you will be unable to evaluate the value of a machine learning product to your business and to your users.

● OKR structure

● Bookings over time - we have a goal of 100 how do we get it there?

Get a lesson out of every case study

E.g. Worth training off explainability

Page 40: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Pricing ● When we first started there was a model that used the most important characteristics about a listing, like

the number of rooms and beds, the neighboring properties, and certain amenities, like a parking space or even a pool. And then essential looked at nearby listings with close similarities to suggest a price

● Simulated what users were doing on their own and automated, and you could throw more features and do better clustering

● Didn’t take into account demand, not flexible. and most importantly wasn’t formulated in a way that would optimize against the right metric

Add the work up front to prove we should invest

6 months - 12 peopel on it now. All from a data science offsite

Indirectly it was whether they accepted or not. Standard recommender is did they take my suggestions.

15 people working on it - huge lever - ux - designers testing those changes.

summarziation/highlights

It was against the metric of traffice. Things to do in san francisco. SEO. this is what this is for.

Page 41: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

SearchSlides from Lisa

Page 42: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Biz Travel - Personalization1) figure out if there is a personalization opportunity. 2) get labeled data. Biz travle. Our hypothesis is that biz travelers are looking for something different than leisure. Is there actually an opportunity there? FIrst you need some labels. Take 1% of traffic and prompt users to tell us if you are traveling for business or leisure. Then you have labeled data. Now we have user attributes and we can see if there is a difference and can we predict if someone is traveling for business or leisure. Trip attributes were also super important. Entire home. Weekdays. Biz travelers usually look at the city level at pseicific address and you’re not starting big and zooming in. Search attributes. Price. Wifi. Then you can build a model and deploy. Show the right business travel promotion. A banner on the booking page to sign up for business travel for the people who are likely for it. A promotion of 100% would cannibalize the promotion space. P5 banners. That gives virality effect where they can sign up> Yahoo is sign up company. Google is its a long tail of small business similar with facebook. Airbnb core product is better for small medium businesses. Next time someone else signs up with the same company its legit and has more than one person. Then we can send an email to those people to ask your travel managers - directly billed to company, find the right listing. Data science is being used to find the long tail that we wouldn’t have found direct sales. Shared itinerary with other people - growth experiment so other people sign up.

Page 43: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Machine learning infrastructureCreating generalized infrastructure so we can do it all

● Making the case for machine learning infrastructure. Machine learning infrastructure. Holistic representation of a listing. Where we are going in the future.

Page 44: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Case studies● Early motivation is looking at our main metric. Search was a very hand tuned in the past. Pricing. Its not

easy! Accuracy is what I can improve but that metric moving is harder. You can improve the performance intrinsically but then you deploy and it looks like the improvement doesn’t lead to the improvement you think of. For example smart pricing you don’t like the suggestion. You’re lowballing. Take into account people’s behavior and how users respond to an improvement.

● The simpler model is often a lot more effective. Better to build something quickly see how it performs and then see if it can be revisit. Can reference the post on coming from academia.

● Ticket routing and user issues - had hard set rules that were very rigid - is you are in this bucket we implemented a probabilistic model that figures automatically what we can do. Go from manual rules to a learned model. Rules failing and then moving to ‘softer’ approaches that are probabilistic. One pattern. We look at signals when the user comes in - surface these links vs those links. Like biz travel. We were ignoring a strong signal that was the text of a ticket. Improve accuracy and also increase volume and optimize precision and recall. Could address CX staffing accordingly. Route more to directly and its ok if they can’t solve it and it takes time to send it back to Airbnb. Impossible to do in previous world. High level talking point - these models give us more flexibility to adapt to the changing dymanics of our business. Set of rules are much harder to tweak. Models give a lot more flexibility.

● Using machine learning to not just build model for predictive performance but to inform analysis. Chao yang on host quality. 30% are worse. Build model on 70%. Learn a model to predict ratings in other bucket. Lead model. PX model. Customizing how users interact with our website using signals available.

● Making the case for machine learning infrastructure. Machine learning infrastructure. Holistic representation of a listing. Where we are going in the future.

Page 45: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Slide Title HereOptional subtitle goes here

● Cereal Entrepreneur: Creative. Embraces constraints. Solution-oriented. Tenacious.

● Be a Host: Collaborative. Anticipates the needs of others. Prepared. Authentic. Listens.

● Embrace the Adventure: Flexible. Risk tolerant. Always learning. Curious. Open-minded.

● Simplify: Distills a problem to its essence. Makes and communicates clear decisions.

● Champion the Mission: Passionate. Committed. Optimistic.

● Every Frame Matters: Thinks holistically. Rigorous about quality. Appreciates the details and prioritizes the right ones.

Page 46: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Slide Title HereOptional subtitle goes here

Page 47: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Slide Title HereOptional subtitle goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed risus arcu, lacinia a aliquet in, vulputate ac turpis.

Donec elit elit, consectetur at hendrerit a, porta ac elit. Vivamus efficitur lacus nec ex porttitor lacinia at et nulla.

Page 48: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 49: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Your text overlay goes here

Page 50: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 51: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Your text overlay goes here

Your text overlay goes here

Page 52: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 53: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Rausch Hackberry Kazan Babu Lima

Beach Ebisu Tirol Foggy Hoff

Brand Colors

Page 54: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 55: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 56: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 57: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 58: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 59: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Product Icons

Page 60: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Page 61: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Iconic Lists

Page 62: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Four ItemsIconic List

List Item 1 List Item 2 List Item 3

Lorem ipsum dolor sit amet, consectetur adipiscing elit

Sed risus arcu, lacinia a aliquet in, vulputate turpis

Donec elit elit, consectetur at hendrerit a, porta ac elit

Vivamus efficiturlacus nec ex porttitor lacinia at et nulla

List Item 4

Page 63: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Five ItemsIconic List

List Item 1 List Item 2 List Item 3

Lorem ipsum dolor sit amet, consectetur adipiscing elit

Sed risus arcu, lacinia a aliquet in, vulputate turpis

Donec elit elit, consectetur at hendrerit a, porta ac elit

Vivamus efficiturlacus nec ex porttitor lacinia at et nulla

List Item 4 List Item 5

Lorem ipsum dolor sit amet, consectetur adipiscing elit

Page 64: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Timelines

Page 65: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Three ItemsTimeline

Time 1 Time 2 Time 3

Lorem ipsum dolor sit amet, consectetur adipiscing elit

Sed risus arcu, lacinia a aliquet in, vulputate turpis

Donec elit elit, consectetur at hendrerit a, porta ac elit

Page 66: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Four ItemsTimeline

Time 1 Time 2 Time 3

Lorem ipsum dolor sit amet, consectetur adipiscing elit

Sed risus arcu, lacinia a aliquet in, vulputate turpis

Donec elit elit, consectetur at hendrerit a, porta ac elit

Vivamus efficiturlacus nec ex porttitor lacinia at et nulla

Time 4

Page 67: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Five ItemsTimeline

Time 1 Time 2 Time 3

Lorem ipsum dolor sit amet, consectetur adipiscing elit

Sed risus arcu, lacinia a aliquet in, vulputate turpis

Donec elit elit, consectetur at hendrerit a, porta ac elit

Vivamus efficiturlacus nec ex porttitor lacinia at et nulla

Time 4 Time 5

Lorem ipsumdolor sit amet, consectetur adipiscing elit

Page 68: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Four Items with Box CalloutTimeline

This is a box callout. Text is fully editable and you can

move it around to different

dots.

Time 1 Time 2 Time 3 Time 4

Page 69: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Six Items with Box CalloutTimeline

Time 1 Time 2 Time 3 Time 4 Time 5

This is a box callout. Text is fully editable and you can

move it around to different

dots.

Time 6

Page 70: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Map of Airbnb Offices

Page 71: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

PortlandSan Francisco

Los Angeles

TorontoNew York

Miami

Sao Paulo

Dublin

London

Paris

Barcelona

Berlin

Milan

Copenhagen

New Delhi

SeoulBeijing

Tokyo

Sydney

Singapore

Washington, DC

Map of Airbnb Offices2016

Page 72: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Charts

Page 73: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Column ChartCharts

Jan Feb Mar Apr

30

May

10

20

30

40

0

20

25

10

40

Page 74: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Column Chart with HighlightCharts

Jan Feb Mar Apr

30

May

10

20

30

40

0

20

25

10

40

Page 75: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Column Chart - MulticolorCharts

Jan Feb Mar Apr

30

May

10

20

30

40

0

20

25

10

40

Page 76: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Bar ChartCharts

10 20 30 40

Apr

Mar

Feb

Jan

May

0

30

20

25

10

40

Page 77: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Bar Chart with HighlightCharts

10 20 30 40

Apr

Mar

Feb

Jan

May

0

30

20

25

10

40

Page 78: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Bar Chart - MulticolorCharts

10 20 30 40

Apr

Mar

Feb

Jan

May

0

30

20

25

10

40

Page 79: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Line ChartCharts

Jan Feb Mar Apr May

10

20

30

40

0

Item 1

Item 2

Item 3

Page 80: Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016

Line Chart with Data PointsCharts

Jan Feb Mar Apr May

10

20

30

40

0

Item 1

Item 2

Item 3