My PhD trajectory
-
Upload
stijn-geuens -
Category
Data & Analytics
-
view
34 -
download
0
Transcript of My PhD trajectory
Factorization Machines for Hybrid Recommendation Systems Based
on Behavioral, Product, and Customer Data
Stijn Geuens
Agenda• PhD Trajectory• Goals• Research Questions• Progress• Future Work
RecSys 2015 [email protected]
RecSys 2015 [email protected]
PhD Trajectory
Computer Science
Machine Learning Math &
Statistics
Business Expertise
Data Engineering
Business Analytics
Data Science
RecSys 2015 [email protected]
Research Questions
Machine Learning What is the added value of combining different data sources?
• More data beats better models (Halevy, Norveg, Pereira, 2009)
• Rich database– Explicit Ratings– Implicit Ratings– Customer Data– Product Data– Context Data
• Different combination methods
RecSys 2015 [email protected]
Research QuestionsHow can we evaluate recommender systems in online settings using business metrics?
• Collaboration with company• Witch metric to optimize?
– Click rates– conversion– Turnover– Loyalty– Etc.
• Does a RecSys affect these business performance?
Business Analytics
RecSys 2015 [email protected]
Current Study
Factorization Machines for Hybrid Recommendation Systems Based
on Behavioral, Product, and Customer Data
RecSys 2015 [email protected]
Motivation• Typologies of systems using different input data:
– Collaborative filtering, content-based, and hybrid (Adomavicius, Tuzhilin, 2005)
– Collaborative filtering, content-based, demographic, knowledge-based, hybrid (Burke, 2000; Bobadilla et al. 2013)
• Each systems has its advantages and disadvantages• Hybridization resolves these issues and leads to better performance• More data trumps better models (Halevy, Norveg, Pereira, 2009)
• This study: Hybridization by combining different data sources (customer, product, behavioral data) by feature combination using a single state-of-the-art algorithm, factorization machines (FM) Combining all different data sources in one algorithm is never done before, especially not in factorization machines research
RecSys 2015 [email protected]
Factorization Machines (FM)• Introduced by Rendle (2010)• Based on Support Vector Machines (SVM) and factorization
models and combines the advantages of both.• SVM: Works with any real valued feature vector, allowing to
integrated different data sources• Factorization Models: Variable interaction is calculated based
on factorized parameters, allowing to estimate interaction under huge sparsity, where SVM’s fail.
• General FM model equation of degree 2:
RecSys 2015 [email protected]
Algorithms• 4 factorization machines
– 3 single data source FMs• Behavioral data (FMBD)
• Customer data (FMCD)
• Product data (FMPD)
– 1 Hybrid FM based on the 3 distinct data sources (FMBD/CD/PD)
• 1 company used hybrid CF benchmark model– Input user-item matrix (M), where each element is defined as follows:
RecSys 2015 [email protected]
Data• 2 distinct data sets:
– Furniture: 5,368 users and 2,601 items– Children’s clothing: 5,999 users and 4,372 items
RecSys 2015 [email protected]
Results• Evaluation: Recall@5 – recall@100• Friedman test with Holm’s Procedure (Demsar 2006):
– Dependent variable = Recall– Independent variable = Algorithm– Cases = selection size – product category combinations
Algorithm FMPD/CD/BD FMBD CF FMCD FMPD
Ranking 1 2.38 2.77 3.95 4.90
NS
RecSys 2015 [email protected]
Results• Furniture category
• Children’s Clothing Category5 15 25 35 45 55 65 75 85 95
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FMPD FMCD FMBDFMPD/CD/BD CF
Selection Size
Reca
ll
5 15 25 35 45 55 65 75 85 950%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FMPD FMCD FMBDFM/PD/CD/BD CF
Selection Size
Reca
ll
RecSys 2015 [email protected]
Future Work: This study
• Preform grid search to identify witch data sources are the most important (on data type level and individual variable level)
• Creating a benchmark hybrid algorithm combining results of different systems created based on each of the data sources
• Evaluation based on other theoretical metrics (precision, F1, AUC, diversity, novelty, etc.)
RecSys 2015 [email protected]
Future Work: PhD
• Implement model at the company and perform a real-life A/B tests– Email system– Webshop
• Evaluation of the implemented algorithm in terms of business metrics (click rates, conversion rates, turnover, loyalty, etc.)
• Investigate which (combination of) business metrics optimize(s) economic value of the RecSys in both short and long term
• Investigate the impact of a RecSys on economic performance of a company
RecSys 2015 [email protected]
Thank you for your Attention
Contact:Stijn Geuens (0)3.20.545.892
IESEG School of Management [email protected] Rue de la Digue fr.linkedin.com/pub/stijn-geuens/
F-59000 Lille stijn.geuens
RecSys 2015 [email protected]
Advantages and disadvantages of different systems
Pros Cons
Collaborative Filtering • No metadata engineering needed
• Serendipity in results• Adaptive
• Scalability• Cold Start for new users
and items• Long tail problem• Stability
Content-based • Comparision between items possible
• No metadata engineering needed
• Adaptive
• Overspecialization• Cold start for new users• Collection of product
information
RecSys 2015 [email protected]
Advantages and disadvantages of different systems
Pros Cons
Knowlegde-based • Deterministic• No cold-start
• Knowledge engineering requered
• Subjective• Static
Demographic • No metadata engineering needed
• Serendipity in results
• Long tail• Cold start for new users• Static