MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
Partner Webinar: Recommendation Engines with MongoDB and Hadoop
-
Upload
mongodb -
Category
Technology
-
view
5.015 -
download
3
description
Transcript of Partner Webinar: Recommendation Engines with MongoDB and Hadoop
K Young - CEO, Mortar
Recommendation Engines with MongoDB and Hadoop
Recommendation Engine
Recommendation engines automatically recommend the "right" items for each user.
• Retail• Music• Videos• Dating• Etc…
WHAT IS IT
EXAMPLES
Recommendation Engine
LinkedIn: 50% of new connections come from "People You May Know"
Netflix: 75% of content is viewed because of a recommendation
Amazon: 35% of sales are driven by recommendations
THAT’S ME
K Young
FOR THIS WEBINAR
Agenda
1. Recommendation Engines2. Hadoop3. Demo: Build a Recommendation Engine4. Your Recommendation Engine5. Q&A
Recommendation EngineNOW GENERALLY AVAILABLE
• Open source, free• Very flexible• Massively Scalable• 100% Customizable• Tested and proven
Recommendation Engine
Technical implementation of how humans make recommendations.
Using:• past behavior• similar users• content metadata• outside signals e.g. instagram
HOW DO THEY WORK?
Recommendation EngineUSER INTERACTIONS: SIGNALS
Recommendation EngineITEM-ITEM RECOMMENDATIONS
Recommendation EngineUSER-ITEM RECOMMENDATIONS
WHERE DO RECOMMENDATIONS APPEAR?
Recommendation Engine
Landing pageProduct pageCartPush emailEtc.
Recommendation Engine
Predictions based on macro-trends, e.g. trending on twitter
Numeric predictions, e.g. price elasticity
WHAT IS IT ISN’T
A WARNING
Recommendation Engine
Recommendation engines are famously hard to launch because they touch: engineering, finance, product, executive.
How to succeed:1) speedy implementation (target 1 week)2) engine flexibility3) gradual roll-out4) visible KPI-impact
RAPID OVERVIEW
Hadoop
Platform for distributed data processing.
Strengths:• Can scale up to thousands of
computers• Widely used• Very broadly applicable• Free, open
Problem:• Difficult to use for complex problems
ON HADOOP
Pig
Less code Compiles to native Hadoop codePopular (LinkedIn, Twitter, Salesforce, Yahoo, Spotify...)
BRIEF, EXPRESSIVELIKE PROCEDURAL SQL
Pig
(thanks: twitter hadoop world presentation)
FOR SERIOUSThe Same Script, In MapReduce
MOTIVATIONS
MongoDB + Pig
Data storage and data processing are often separate concerns
Hadoop is built for scalable processing of large datasets
SIMILAR PHILOSOPHY
MongoDB, Pig
Poly-structured data• MongoDB: stores data, regardless of
structure• Pig: reads data, regardless of structure
SIMILAR PHILOSOPHY
MongoDB Hadoop Connector
Open source connector for Hadoop (and family) to read from and write to MongoDB.
(Links at end).
Build a recommendation engineENOUGH PREAMBLE, NOW IT’S…
Demo Time!
Build a recommendation engineDEMO AGENDA
1) Intro to Mortar
2) Download recommendation code
3) Hook up the demo implementation (last.fm)
4) Generate recommendations at scale
5) View recommendations
Build a recommendation engineDEMO
Use Mortar for demo
Free to use
Open, code runs anywhere
Complete tutorial online (link at end)
MortarONLINE TUTORIAL
MortarFAST INTRO
MortarFAST INTRO
Data science lacks a way to organize, test, deploy, and collaborate with code. So:
• One-button code deployment, powered by Github
• Award-winning job monitoring and visualization
• Realtime log collection and error analysis
• Free local development with one-click installation
> mortar projects:fork [email protected]:mortardata/mortar-recsys.git mortar_webinar_20140415
Sending request to register project: mortar_webinar_20140415... done
Status: Success!
Your project is ready for use. Type 'mortar help' to see the commands you can perform on the project.
DEFINITIONS
Recommendation Engine
Users: Someone interacting with your items and generating events that you captureItems: The things you are recommending: videos, articles, products, etc.Signal: A user-item interaction with a weighting that tells us the relative value of the interaction.
Recommendation EngineUSER INTERACTIONS: SIGNALS
STEPS
Recommendation Engine
Steps in a recommendation engine:• Load your data• Generate your signals• Call code to generate
recommendations• Store your recommendationsNot covered today:• Serve your recommendations• Track KPI-impact
DEMO
Recommendation Engine
17.5MM documents of 360K users’ top played artists. Provided by Last.fm at http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-360K.html
Used a Pig job to load a MongoLab database with the data.
> db.lastfm_plays.find()
{ "user" : "faf…a60", "num_plays" : 67,
"artist_name" : "beastie boys" }
{ "user" : "faf0…a60", "num_plays" : 66,
"artist_name" : "the beatles" }
{ "user" : "faf0…a60", "num_plays" : 65,
"artist_name" : "the smashing pumpkins" }
DEMO: LOAD THE DATA
Recommendation Engine
First step: Load our listening data.
%default DB 'mongo_webinar'
%default PLAYS_COLLECTION ‘lastfm_plays'
raw_input =
load '$CONN/$DB.$PLAYS_COLLECTION'
using com.mongodb.hadoop.pig.MongoLoader('
user:chararray,
artist_name:chararray,
num_plays:int
');
Pig code
DEMO: GENERATE SIGNALS
Recommendation Engine
Now that we have our data loaded we need to extract: user, item, signal.
user_signals = foreach raw_input generate
user,
artist_name as item,
num_plays as weight:int;
Pig code
DEMO: CALL MORTAR
Recommendation Engine
Now that the data is in the correct format we’ll call the mortar algorithms for generating item-item and user-item recommendations.
item_item_recs =
recsys__GetItemItemRecommendations(user_signals);
user_item_recs =
recsys__GetUserItemRecommendations(user_signals,
item_item_recs);
Pig code
DEMO: STORE OUR RESULTS
Recommendation Engine
Now that we have our results let’s store them back to MongoDB for use by our application.
%default II_COLLECTION 'item_item_recs'
%default UI_COLLECTION 'user_item_recs'
store item_item_recs into
'$CONN/$DB.$II_COLLECTION' using
com.mongodb.hadoop.pig.MongoInsertStorage('','');
store user_item_recs into
'$CONN/$DB.$UI_COLLECTION' using
com.mongodb.hadoop.pig.MongoInsertStorage('','');
Pig code
DEMO: RUN IT!
Recommendation Engine
Now we’re going to use Mortar to start and manage a Hadoop cluster to run our recommender.
> mortar run pigscripts/mongo/lastfm-recsys-online.pig -f params/lastfm.params --clustersize 10
Taking code snapshot... done
Sending code snapshot to Mortar... done
Requesting job execution... done
job_id: 534462bea22f3803fd9cacca
Job status can be viewed on the web at:
https://app.mortardata.com/jobs/job_detail?job_id=53
4462bea22f3803fd9cacca
> db.item_item_recs.find()
{ "item_A":"yo-yo ma", "rank":1,
"item_B":"natalie clein" }
{ "item_A":"miley cyrus", "rank":1,
"item_B":"miley cyrus and billy ray cyrus” }
{ "item_A":"dimmu borgir", "rank":1,
"item_B":"ad inferna” }
EVALUATING YOUR RESULTS
Your Recommendation Engine
At first, use your knowledge of your domain knowledge to determine whether recommendations are sensible.
Mortar provides a recommendation browser.
EVALUATING YOUR RESULTS
Your Recommendation Engine
Optionally get detailed recommendations.
item_item_recs =
recsys__GetItemItemRecommendationsDetailed(user_signals
);
Pig code
EVALUATING YOUR RESULTS
Your Recommendation Engine
Later, run A/B tests with your recommendations to see how they improve the metrics you care about.
Usually not multivariate.
Usually no training set is possible.
CUSTOMIZING
Your Recommendation Engine
To make customization easier Mortar has help documentation and code covering more than a dozen common cases:
• Removing bots from your signal data
• Removing out-of-stock items• Boosting popular items• Adding categories to your items• Cold start• Greater discovery and variety
PRODUCTION QUESTIONS
Your Recommendation Engine
How do you read your MongoDB?
1) Read backup files from S32) Connect to secondary nodes3) Connect to primary nodes4) Connect to dedicated analytics nodes5) Turn file-system snapshot backups into BSON
PRODUCTION QUESTIONS
Your Recommendation Engine
How do you release new recommendations while serving the old ones?
APIFlip between live and offline databaseAlso enables rollback
WE DISCUSSED
Summary
What a recommendation engine isHow Hadoop works with MongoDBSet up a demo recommendation engineHow to connect your data Touched on advanced techniquesSteered away from pot holesResources for next step
help.mortardata.com/recommenders
answers.mortardata.com
@kky@mortardata