Jane Recommendation Engines
-
Upload
adam-rogers -
Category
Data & Analytics
-
view
102 -
download
0
Transcript of Jane Recommendation Engines
Adam Rogers, Data Scientist at Jane.com
Jane Recommendation Engine
Jane Recommendation Overview
Amazon’s percent of sales from recommendation 35% (2006)
Netflix estimates that 75 percent of viewer activity is driven by recommendation. (2013 - Wired)
Why Recommendations?
How does it work?
Application User Events Kinesis
Lambda
Lambda
Lambda
DB
Collaborative FilteringAmazon’s “Users also Purchased”
Recommend products based on shared activity with other users
Predicts what other product-user mappings are likely based on current ones
www.amazon.com
The Tools: Spark, Mahout, CloudsearchSpark:
Fast Parallel Data Processing and Machine Learning
Scales to massive amounts of data
Mahout:Parallel Linear Algebra (Matrix Operations) and Machine Learning
Spark and Mahout together enable fast collaborative filtering on massive datasets
Cloudsearch:AWS’ Fast full-text search engine built on Solr
Cloudsearch allows you to do weighted queries on recommended products - lets you use multiple facets and actions in your recommendations
http://s6.postimg.org/r0m8bpjw1/recommender_architecture.png
Jane’s Recommendation Challenges“Cold Start Problem” To the Max
No long-lived products to use as baseline for new ones
Every day ⅓ of products are brand new
Means we need to use events as far back as we reasonably can in our calculation
http://www.beautifulonraw.com/raw-food-blog/wp-content/uploads/2010/06/Shivering.jpg
Other Types of RecommendersContent
Popular
User Similarity
"Collaborative Filtering in Recommender Systems" by Moshanin - Own work. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/File:Collaborative_Filtering_in_Recommender_Systems.jpg#/media/File:Collaborative_Filtering_in_Recommender_Systems.jpg
Content RecommendationsRecommend items that are similar to the given item
Based on information contained in the item - title, description, images, etc.
Avoids the “Cold Start” problem
User may not want to buy 2 very similar things though
Word Embeddings
Word Embeddingshttp://spark-public.s3.amazonaws.com/neuralnets/images/Lecture4/turian.png
Content Recommendations with Word Embeddings
Calculate word embeddings on text within product (description, title, tags, etc.)
Compute distances between “embedded” product informationEuclidean distance is poor in such high dimensions - try cosine, mahalanobis, others
N nearest neighbors to the product in question are your recommendation
Improving Content RecommendationsRemove meaningless, common stopwords
Weight your embedded vectors on given criteria
Use category information
Get creative with your data - different patterns in each dataset
Improving, cont.Can “embed” images in a similar fashion using deep networks
Compute distance between embedded images
Combine image distances and text distances to give combined distance metric
Determine nearest neighbors from new distance metric
SummaryRecommendations are a powerful (and these days, standard and
necessary) tool for improving customer interaction, conversion, etc.
Collaborative filtering is a proven algorithm for relevant recommendations (given lots of user data and products)
Great tools for building collaborative filtering recommendation systems exist (AWS, Spark, etc.) but you need to adapt to your specific needs
Content recommendations can supplement the weaknesses of collaborative filtering
Get creative to improve the quality of your recommendations
Sourceshttp://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf
"Collaborative Filtering in Recommender Systems" by Moshanin - Own work. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/File:Collaborative_Filtering_in_Recommender_Systems.jpg#/media/File:Collaborative_Filtering_in_Recommender_Systems.jpg
https://aws.amazon.com/kinesis/streams/