Post on 28-Aug-2014
description
Anmol BhasinSr. Manager
Analytics Engineeringwww.linkedin.com
Beyond Ratings
& Followers
In a social (professional) networking context,
its about building a..
Recommender Ecosystem
4
50%
The answer is
Similar Profiles
Events You May Be Interested In
News
The Recommender Ecosystem
11
Network updates
Connections
Frameworks are revolutions
evolutions
LinkedIn Recommendation Engine
BehaviorAnalysis
CollaborativeFiltering Popularity
Sim
ilar
Pro
files
Ref
erra
l C
ente
r
Tale
ntM
atch
Peo
ple
Bro
wse
Map
People
Recommen-dation Types
Shared, Dynamic,Unified
CoreService
Products
RecommendationEntities
Jobs
Bro
wse
M
ap
Sim
ilar J
obs
Jobs
Jobs
You
M
ay b
e in
tere
sted
in
… AdsCompaniesSearchesNewsEvents… and more
GY
ML
Gro
ups
Bro
wse
Map
Groups
Sim
ilar G
roup
s
User Feedback
API
(R-T) Feature Extraction, Entity Resolution & Enrichment
(R-T) matching computations
A/B
Offline data munging (hadoop)
different strokes for different
folks
Cloning
Possible Approaches
Naïve K Nearest Neighbor solution Complexity is
Clustering Latent Factor Models like PLSI or LDA Hierarchical Agglomerative clustering
Self Organizing Maps
Item based Collaborative Filtering Find pairs of Users viewed in the same session
Scale 175+ M profiles
Dimensionality ~2M companies ~200K schools ~147 industries ~200 countries ~25K titles ~40K Skills ~200 Job Functions
Similar means different things to different people Similar Behavior doesn’t mean you can replace me at my job Accuracy vs Relevance (me & my boss.. )
Realtime.. It’s a problem of accuracy.. Not recall..
Challenges
Approach
Rank
FILTER
Cluster
Focus attention only on pairs likely to be similar
Filter out the possibly dis-similar pairs
Run Similarity Functions on filtered in pairs
LSH function family for Cosine Distance
Locality Sensitive Hashing
Approach
Rank
FILTER
Cluster
Focus attention only on pairs likely to be similar
Filter out the possibly dis-similar pairs
Run Similarity Functions on filtered in pairs
Similarity Functions
Different bands of attributes Boolean, Jaccard or Cosine Similarities across attribute
pairs.
• Logisitic Regression with Elastic Penalty
Learn model params on a set of hand labeled data points
Predicted value interpreted as score
Impedance Mismatch
Ad Ranking Given
Objective
Goal: Increase revenue Respect daily budgets of Advertisers Good user experience
Campaign creation
Virtual Profiling
Targeted Segment Population
Title : Eng MgrCompany : LinkedInLocation : CA,USA Skills : ML, RecSys
Title : Vice PresidentCompany : TwitterLocation : CA,USA Skills : DM, ML, RecSys ……………….
Virtual Profiling
Title : Eng MgrCompany : LinkedInLocation : CA,USA Skills : ML, RecSys
Title : Sr. SECompany : GoogleLocation : PA, USASkills : ML, DMTitle : Eng DirCompany : LinkedinLocation : PA, USASkills : ML, Stats, DM
Title : Sr. SE<1>, Eng Mgr<1>, Eng Dir<1>
Company : LinkedIn<2>,
Google<1>,
Location : CA,USA <2>, PA, USA<1>
Skills : ML<2>,
RecSys<1>, Stats<1>, DM<1>
Clicker Feature Distribution
Virtual Profiling
Information Gain
Pick Top K overrepresented features from the clicker distribution vs the target
segment
A representative projection of the item in the member feature space
CTR Prediction – CF Similarity
RankerMEMBER FEATURES
Score to pCTR correction
L2 regularized Logistic Regression (Liblinear, VW, Mahout, ADMM)
For new ad creatives back-off to the advertiser / ad category nodes till they reach critical impression/click volume (explore/exploit)
AD CREATIVE VIRTUAL PROFILE
Creative features
the magic is in the models
features
30
Feature Engineering – Entity Resolution
Companies
Huge impact on the business and UE Ad targeting TalentMatch Referrals
‘IBM’ has 8000+ variations- ibm – ireland- ibm research- T J Watson Labs- International Bus. Machines- Deep Blue
K-Ambiguous
Asonam’11, KDD’11
Open to relocation ? Region similarity based on profiles or network Region transition probability
predict individuals propensity to migrate and most likely migration target
Impact on job recommendations 20% lift in
views/viewers/applications/applicants
Feature Engineering – Sticky Locations
32
What should you transition to .. and when ?
Months since graduation
Prob
abili
ty o
f sw
itch
rethinking delivery
Social Referral
Social Referral
Mohammad Amin, Baoshi Yan, Sripad Sriram, Anmol Bhasin, Christian Posse. Social Referral : Using network connections to deliver
recommendations. To appear in Proceedings of the Sixth ACM conference on Recommender systems (RecSys '12)
> 2X Conversion
Linkedin Group: Text Analytics
I found this group interesting, and I think you will too
Deepak
Linkedin Group: Text Analytics
From: Deepak Agarwal – Engineering Director, LinkedIn
2X conversion
Big Data A/B is the
new
Orthogonality in A/B
383838
1. Novelty effect E.g., new job recommendation
algorithms have week-long novelty effect that shows lifts twice the stationary (real) one
2. Cannibalization Zero-sum game or real lift?
3. Random sampling destroys network effect
Beware of some A/B testing pitfalls
1 week lifts 2weeks lifts
Tech Stack
Open Source Technologies
ZoieBobo
KafkaVoldemort
40http://data.linkedin.com
It takes a village
Credits
Engineering : Abhishek Gupta, Adam Smyczek, Adil Aijaz, Alan Li, Baoshi Yan, Bee-Chung Chen, Deepak Agarwal, Ethan Zhang, Haishan Liu, Igor Perisic, Jonathan Traupman, Liang Zhang, Lokesh Bajaj, Mario Rodriguez, Mitul Tiwari, Mohammad Amin, Monica Rogati, Parul Jain, Paul Ogilvie, Sam Shah, Sanjay Dubey, Tarun Kumar, Trevor Walker, Utku Irmak
Product : Andrew Hill, Christian posse, Gyanda Sachdeva, Mike Grishaver, Parker Barrile, Sachit Kamat Alphabetically sorted
You
Picture yourself with this New Job:
Applied Researcher /Research Engineer
A Recommendation for you..