My PhD trajectory

Factorization Machines for Hybrid Recommendation Systems Based

on Behavioral, Product, and Customer Data

Stijn Geuens

Agenda• PhD Trajectory• Goals• Research Questions• Progress• Future Work

RecSys 2015 [email protected]


PhD Trajectory

Computer Science

Machine Learning Math &

Statistics

Business Expertise

Data Engineering

Business Analytics

Data Science


Research Goals

Machine Learning

Data Engineering

Business Analytics


Research Questions

Machine Learning What is the added value of combining different data sources?

• More data beats better models (Halevy, Norveg, Pereira, 2009)

• Rich database– Explicit Ratings– Implicit Ratings– Customer Data– Product Data– Context Data

• Different combination methods


Research QuestionsHow can we evaluate recommender systems in online settings using business metrics?

• Collaboration with company• Witch metric to optimize?

– Click rates– conversion– Turnover– Loyalty– Etc.

• Does a RecSys affect these business performance?

Business Analytics


Current Study

Factorization Machines for Hybrid Recommendation Systems Based

on Behavioral, Product, and Customer Data


Motivation• Typologies of systems using different input data:

– Collaborative filtering, content-based, and hybrid (Adomavicius, Tuzhilin, 2005)

– Collaborative filtering, content-based, demographic, knowledge-based, hybrid (Burke, 2000; Bobadilla et al. 2013)

• Each systems has its advantages and disadvantages• Hybridization resolves these issues and leads to better performance• More data trumps better models (Halevy, Norveg, Pereira, 2009)

• This study: Hybridization by combining different data sources (customer, product, behavioral data) by feature combination using a single state-of-the-art algorithm, factorization machines (FM) Combining all different data sources in one algorithm is never done before, especially not in factorization machines research


Factorization Machines (FM)• Introduced by Rendle (2010)• Based on Support Vector Machines (SVM) and factorization

models and combines the advantages of both.• SVM: Works with any real valued feature vector, allowing to

integrated different data sources• Factorization Models: Variable interaction is calculated based

on factorized parameters, allowing to estimate interaction under huge sparsity, where SVM’s fail.

• General FM model equation of degree 2:


Algorithms• 4 factorization machines

– 3 single data source FMs• Behavioral data (FMBD)

• Customer data (FMCD)

• Product data (FMPD)

– 1 Hybrid FM based on the 3 distinct data sources (FMBD/CD/PD)

• 1 company used hybrid CF benchmark model– Input user-item matrix (M), where each element is defined as follows:


Data• 2 distinct data sets:

– Furniture: 5,368 users and 2,601 items– Children’s clothing: 5,999 users and 4,372 items


Results• Evaluation: Recall@5 – recall@100• Friedman test with Holm’s Procedure (Demsar 2006):

– Dependent variable = Recall– Independent variable = Algorithm– Cases = selection size – product category combinations

Algorithm FMPD/CD/BD FMBD CF FMCD FMPD

Ranking 1 2.38 2.77 3.95 4.90

NS


Results• Furniture category

• Children’s Clothing Category5 15 25 35 45 55 65 75 85 95

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

FMPD FMCD FMBDFMPD/CD/BD CF

Selection Size

Reca

ll

5 15 25 35 45 55 65 75 85 950%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

FMPD FMCD FMBDFM/PD/CD/BD CF

Selection Size

Reca

ll


Future Work: This study

• Preform grid search to identify witch data sources are the most important (on data type level and individual variable level)

• Creating a benchmark hybrid algorithm combining results of different systems created based on each of the data sources

• Evaluation based on other theoretical metrics (precision, F1, AUC, diversity, novelty, etc.)


Future Work: PhD

• Implement model at the company and perform a real-life A/B tests– Email system– Webshop

• Evaluation of the implemented algorithm in terms of business metrics (click rates, conversion rates, turnover, loyalty, etc.)

• Investigate which (combination of) business metrics optimize(s) economic value of the RecSys in both short and long term

• Investigate the impact of a RecSys on economic performance of a company


Thank you for your Attention

Contact:Stijn Geuens (0)3.20.545.892

IESEG School of Management [email protected] Rue de la Digue fr.linkedin.com/pub/stijn-geuens/

F-59000 Lille stijn.geuens


Advantages and disadvantages of different systems

Pros Cons

Collaborative Filtering • No metadata engineering needed

• Serendipity in results• Adaptive

• Scalability• Cold Start for new users

and items• Long tail problem• Stability

Content-based • Comparision between items possible

• No metadata engineering needed

• Adaptive

• Overspecialization• Cold start for new users• Collection of product

information


Advantages and disadvantages of different systems

Pros Cons

Knowlegde-based • Deterministic• No cold-start

• Knowledge engineering requered

• Subjective• Static

Demographic • No metadata engineering needed

• Serendipity in results

• Long tail• Cold start for new users• Static

My PhD trajectory

Data & Analytics

Transcript of My PhD trajectory