Tutorial bpocf

137

Click here to load reader

Transcript of Tutorial bpocf

Page 1: Tutorial bpocf

Collaborative Filtering with Binary, Positive-only Data

Tutorial @ ECML PKDD, September 2015, Porto

Koen Verstrepen+, Kanishka Bhaduri*, Bart Goethals+ *

+

Page 2: Tutorial bpocf

Agenda •  Introduction •  Algorithms •  Netflix

Page 3: Tutorial bpocf

Agenda •  Introduction •  Algorithms •  Netflix

Page 4: Tutorial bpocf

Binary, Positive-Only Data

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 5: Tutorial bpocf

Collaborative Filtering

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 6: Tutorial bpocf

Movies

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 7: Tutorial bpocf

Music

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 8: Tutorial bpocf

Social Networks

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 9: Tutorial bpocf

Tagging / Annotation

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Paris

New York

Porto

Statue of Liberty

Eiffel Tower

Page 10: Tutorial bpocf

Also Explicit Feedback

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 11: Tutorial bpocf

Matrix Representation

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1       1       1  

    1          

1           1  

1           1  

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

R

Page 12: Tutorial bpocf

Unknown = 0 no negative information

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1    0   1    0   1  

 0   1   0   0      0  

1   0     0     1   0  

0   1    0    0   1  

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

R

Page 13: Tutorial bpocf

Different Data

Ratings Graded relevance,

Positive-Only Binary,

Positive-Only

1       5       4  

    3   3          

4           2   2  

5   5           1  

    5       4  

           

4          

5   5          

    X       X  

           

X          

X   X          

•  •  Movies •  Music •  …

•  Minutes watched •  Times clicked •  Times listened •  Money spent •  Visits/week •  …

•  Seen •  Bought •  Watched •  Clicked •  …

Page 14: Tutorial bpocf

Sparse 10 in 10 000

Page 15: Tutorial bpocf

Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference

•  Netflix

Page 16: Tutorial bpocf

Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference

•  Netflix

Page 17: Tutorial bpocf

pLSA An elegant example

[Hofmann 2004]

Page 18: Tutorial bpocf

pLSA probabilistic Latent Semantic Analysis

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 19: Tutorial bpocf

pLSA latent interests

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRD

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 20: Tutorial bpocf

pLSA generative model

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRD

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 21: Tutorial bpocf

pLSA probabilistic weights

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRD

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0PDd=1 p(d | u) = 1Pi2I p(i | d) = 1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0PDd=1 p(d | u) = 1Pi2I p(i | d) = 1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I p(i | d) = 1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 22: Tutorial bpocf

pLSA compute like-probability

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRD

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 23: Tutorial bpocf

pLSA computing the weights

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRD

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Dd = 1

d = 1

d = D

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

(tempered)  Expecta5on-­‐Maximiza5on  (EM)  

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 24: Tutorial bpocf

pLSA à General

Page 25: Tutorial bpocf

pLSA recap

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRD

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 26: Tutorial bpocf

pLSA recap

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRD

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log(1 � Sui) + �

⇣||S(1)||2F + ||S(2)||2F

2

|u| models/user

2

|u|

S = S(1) ⇤ S(2)+ S(3)

+ S(4)S(5)S(6)

p(d1|u)

p(d2|u)

p(dD|u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log(1 � Sui) + �

⇣||S(1)||2F + ||S(2)||2F

2

|u| models/user

2

|u|

S = S(1) ⇤ S(2)+ S(3)

+ S(4)S(5)S(6)

p(d1|u)

p(d2|u)

p(dD|u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log(1 � Sui) + �

⇣||S(1)||2F + ||S(2)||2F

2

|u| models/user

2

|u|

S = S(1) ⇤ S(2)+ S(3)

+ S(4)S(5)S(6)

p(d1|u)

p(d2|u)

p(dD|u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:36 K. Verstrepen et al.

p(i|d1)

p(i|d2)

p(i|dD)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. InProceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.

Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-book. Springer, 145–186.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommendersystems. Computer 8 (2009), 30–37.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:36 K. Verstrepen et al.

p(i|d1)

p(i|d2)

p(i|dD)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. InProceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.

Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-book. Springer, 145–186.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommendersystems. Computer 8 (2009), 30–37.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:36 K. Verstrepen et al.

p(i|d1)

p(i|d2)

p(i|dD)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. InProceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.

Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-book. Springer, 145–186.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommendersystems. Computer 8 (2009), 30–37.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

6.7.2. New Item Problem.

7. EVALUATION METHODS7.1. offline

— Who: ?— evaluation measures— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

6.7.2. New Item Problem.

7. EVALUATION METHODS7.1. offline

— Who: ?— evaluation measures— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

6.7.2. New Item Problem.

7. EVALUATION METHODS7.1. offline

— Who: ?— evaluation measures— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 27: Tutorial bpocf

pLSA recap

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRD

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log(1 � Sui) + �

⇣||S(1)||2F + ||S(2)||2F

2

|u| models/user

2

|u|

S = S(1) ⇤ S(2)+ S(3)

+ S(4)S(5)S(6)

p(d1|u)

p(d2|u)

p(dD|u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log(1 � Sui) + �

⇣||S(1)||2F + ||S(2)||2F

2

|u| models/user

2

|u|

S = S(1) ⇤ S(2)+ S(3)

+ S(4)S(5)S(6)

p(d1|u)

p(d2|u)

p(dD|u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log(1 � Sui) + �

⇣||S(1)||2F + ||S(2)||2F

2

|u| models/user

2

|u|

S = S(1) ⇤ S(2)+ S(3)

+ S(4)S(5)S(6)

p(d1|u)

p(d2|u)

p(dD|u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:36 K. Verstrepen et al.

p(i|d1)

p(i|d2)

p(i|dD)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. InProceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.

Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-book. Springer, 145–186.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommendersystems. Computer 8 (2009), 30–37.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:36 K. Verstrepen et al.

p(i|d1)

p(i|d2)

p(i|dD)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. InProceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.

Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-book. Springer, 145–186.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommendersystems. Computer 8 (2009), 30–37.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:36 K. Verstrepen et al.

p(i|d1)

p(i|d2)

p(i|dD)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. InProceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.

Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-book. Springer, 145–186.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommendersystems. Computer 8 (2009), 30–37.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

6.7.2. New Item Problem.

7. EVALUATION METHODS7.1. offline

— Who: ?— evaluation measures— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

6.7.2. New Item Problem.

7. EVALUATION METHODS7.1. offline

— Who: ?— evaluation measures— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

6.7.2. New Item Problem.

7. EVALUATION METHODS7.1. offline

— Who: ?— evaluation measures— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 28: Tutorial bpocf

pLSA matrix factorization notation

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRD

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

|U| ⇥ |I||U| ⇥ DD ⇥ |I||U||I|D

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

|U| ⇥ |I||U| ⇥ DD ⇥ |I||U||I|D

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

|U| ⇥ |I||U| ⇥ DD ⇥ |I||U||I|D

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 29: Tutorial bpocf

pLSA matrix factorization notation

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRD

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(i | d)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

D ⇥|U||I|

|U||I|

|I|D

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.H. Ma. 2013. An Experimental Study on Implicit Social Recommendation. In SIGIR. 73–82.G.V. Menezes, J.M. Almeida, F.B. Belem, M.A. Goncalves, A. Lacerda, E. Silva de Moura, G.L. Pappa, A.

Veloso, and N. Ziviani. 2010. Demand Drive Tag Recommendation. In ECML/PKDD. 402–417.B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. 2001. Effective Personalization Based on Association Rule

Discovery from Web Usage Data. In WIDM. 9–15.X. Ning and G. Karypis. 2011. Slim: Sparse Linear Methods for Top-N recommender systems. In ICDM.

497–506.R. Pan and M. Scholz. 2009. Mind the Gaps: Weighting the Unknown in Large-scale One-class Collaborative

Filtering. In KDD. 667–676.R. Pan, Y. Zhou, B. Cao, N.N. Liu, R. Lukose, M. Scholz, and Q. Yang. 2008. One-Class Collaborative Filter-

ing. In ICDM. 502–511.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.H. Ma. 2013. An Experimental Study on Implicit Social Recommendation. In SIGIR. 73–82.G.V. Menezes, J.M. Almeida, F.B. Belem, M.A. Goncalves, A. Lacerda, E. Silva de Moura, G.L. Pappa, A.

Veloso, and N. Ziviani. 2010. Demand Drive Tag Recommendation. In ECML/PKDD. 402–417.B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. 2001. Effective Personalization Based on Association Rule

Discovery from Web Usage Data. In WIDM. 9–15.X. Ning and G. Karypis. 2011. Slim: Sparse Linear Methods for Top-N recommender systems. In ICDM.

497–506.R. Pan and M. Scholz. 2009. Mind the Gaps: Weighting the Unknown in Large-scale One-class Collaborative

Filtering. In KDD. 667–676.R. Pan, Y. Zhou, B. Cao, N.N. Liu, R. Lukose, M. Scholz, and Q. Yang. 2008. One-Class Collaborative Filter-

ing. In ICDM. 502–511.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 30: Tutorial bpocf

Scores = Matrix Factorization

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

D ⇥|U||I|

|U||I|

|I|D

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.H. Ma. 2013. An Experimental Study on Implicit Social Recommendation. In SIGIR. 73–82.G.V. Menezes, J.M. Almeida, F.B. Belem, M.A. Goncalves, A. Lacerda, E. Silva de Moura, G.L. Pappa, A.

Veloso, and N. Ziviani. 2010. Demand Drive Tag Recommendation. In ECML/PKDD. 402–417.B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. 2001. Effective Personalization Based on Association Rule

Discovery from Web Usage Data. In WIDM. 9–15.X. Ning and G. Karypis. 2011. Slim: Sparse Linear Methods for Top-N recommender systems. In ICDM.

497–506.R. Pan and M. Scholz. 2009. Mind the Gaps: Weighting the Unknown in Large-scale One-class Collaborative

Filtering. In KDD. 667–676.R. Pan, Y. Zhou, B. Cao, N.N. Liu, R. Lukose, M. Scholz, and Q. Yang. 2008. One-Class Collaborative Filter-

ing. In ICDM. 502–511.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.H. Ma. 2013. An Experimental Study on Implicit Social Recommendation. In SIGIR. 73–82.G.V. Menezes, J.M. Almeida, F.B. Belem, M.A. Goncalves, A. Lacerda, E. Silva de Moura, G.L. Pappa, A.

Veloso, and N. Ziviani. 2010. Demand Drive Tag Recommendation. In ECML/PKDD. 402–417.B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. 2001. Effective Personalization Based on Association Rule

Discovery from Web Usage Data. In WIDM. 9–15.X. Ning and G. Karypis. 2011. Slim: Sparse Linear Methods for Top-N recommender systems. In ICDM.

497–506.R. Pan and M. Scholz. 2009. Mind the Gaps: Weighting the Unknown in Large-scale One-class Collaborative

Filtering. In KDD. 667–676.R. Pan, Y. Zhou, B. Cao, N.N. Liu, R. Lukose, M. Scholz, and Q. Yang. 2008. One-Class Collaborative Filter-

ing. In ICDM. 502–511.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 31: Tutorial bpocf

Deviation Function

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

min D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

min D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

min D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 32: Tutorial bpocf

Summary: 2 Basic Building Blocks

Factorization Model

Deviation Function

Page 33: Tutorial bpocf

Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Parameter inference

•  Netflix

Page 34: Tutorial bpocf

Tour of The Models

Page 35: Tutorial bpocf

pLSA soft clustering interpretation

user-item scores

user-cluster affinity

item-cluster affinity

mixed clusters

[Hofmann 2004] [Hu et al. 2008]

[Pan et al. 2008] [Sindhwani et al. 2010]

[Yao et al. 2014] [Pan and Scholz 2009]

[Rendle et al. 2009] [Shi et al. 2012]

[Takàcs and Tikk 2012]

Page 36: Tutorial bpocf

pLSA soft clustering interpretation

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

D ⇥|U||I|

|U||I|

|I|D

0.05  

0.1  

0.5  

0.3  

0.4  

0.1  

0.4  

0.1  

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

|U| ⇥ |I||U| ⇥ DD ⇥ |I||U||I|D = 4

d = 1

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

|U| ⇥ |I||U| ⇥ DD ⇥ |I||U||I|D = 4

d = 1

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

d = 2

d = 3

d = 4

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

d = 2

d = 3

d = 4

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

d = 2

d = 3

d = 4

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

0.04  

0.01  

0.20  

0.03  

0.28  

user-item scores

user-cluster affinity

item-cluster affinity

Page 37: Tutorial bpocf

Hard Clustering

user-item scores

user-uCluster membership

item-iCluster membership

item probabilities

uCluster-iCluster similarity

[Hofmann 2004] [Hofmann 1999]

[Ungar and Foster 1998]

Page 38: Tutorial bpocf

Item Similarity dense

user-item scores original rating matrix item-item similarity

[Rendle et al. 2009] [Aiolli 2013]

Page 39: Tutorial bpocf

Item Similarity sparse

user-item scores item-item similarity original rating matrix

[Deshpande and Karypis 2004] [Sigurbjörnsson and Van Zwol 2008]

[Ning and Karypis 2011]

Page 40: Tutorial bpocf

User Similarity sparse

user-item scores column normalized original rating matrix

(row normalized) user-user similarity

[Sarwar et al. 2000]

Page 41: Tutorial bpocf

User Similarity dense

user-item scores column normalized original rating matrix

(row normalized) user-user similarity

[Aiolli 2014] [Aiolli 2013]

Page 42: Tutorial bpocf

User+Item Similarity

[Verstrepen and Goethals 2014]

Page 43: Tutorial bpocf

Factored Item Similarity symmetrical

user-item scores original rating matrix Identical item profiles

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONx

UIRDd = 1

d = D...uip(u | i)p(d | u)

p(d | u) � 0

p(i | d) � 0

DPd=1

p(d | u) = 1

Pi2I

p(i | d) = 1

p(i|u) =

DX

d=1

p(i|d) · p(d|u)

max

PRui=1

log p(i | u)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

item clusters Item-cluster affinity Similarity by dotproduct

[Weston et al. 2013b]

Page 44: Tutorial bpocf

Factored Item Similarity asymmetrical + bias

user-item scores

original rating matrix row normalized

Item profile if known preference

Item profile if candidate item biases user biases

[Kabbur et al. 2013]

Page 45: Tutorial bpocf

Higher Order Item Similarity inner product

user-item scores extended rating matrix Itemset-item similarity

selected higher order itemsets [Christakopoulou and Karypis 2014]

[Deshpande and Karypis 2004] [Menezes et al. 2010] [van Leeuwen and Puspitaningrum 2012] [Lin et al. 2002]

Page 46: Tutorial bpocf

Higher Order Item Similarity max product

0.05  

0.1  

0.5  

0.3  

0.4  

0.1  

0.4  

0.1  

0.04  

0.01  

0.20  

0.03  

0.20  

max

MP

[Sarwar et al. 2001] [Mobasher et al. 2001]

Page 47: Tutorial bpocf

Higher Order User Similarity inner product

user-item scores user-userset similarity extended rating matrix

selected higher order usersets

[Lin et al. 2002]

Page 48: Tutorial bpocf

Best of few user models non linearity by max

[Weston et al. 2013a]

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

~ 3 models/user

Page 49: Tutorial bpocf

Best of all user models efficient max out of

[Verstrepen and Goethals 2015]

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log(1 � Sui) + �

⇣||S(1)||2F + ||S(2)||2F

2

|u| models/user

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log(1 � Sui) + �

⇣||S(1)||2F + ||S(2)||2F

2

|u| models/user

2

|u|

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 50: Tutorial bpocf

Combination item vectors can be shared

[Kabbur and Karypis 2014]

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 51: Tutorial bpocf

Sigmoid link function for probabilistic frameworks

[Johnson 2014]

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( ) · d

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. InProceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.

Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-book. Springer, 145–186.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommendersystems. Computer 8 (2009), 30–37.

Weiyang Lin, Sergio A Alvarez, and Carolina Ruiz. 2002. Efficient adaptive-support association rule miningfor recommender systems. Data mining and knowledge discovery 6, 1 (2002), 83–105.

Hao Ma. 2013. An experimental study on implicit social recommendation. In Proceedings of the 36th inter-national ACM SIGIR conference on Research and development in information retrieval. ACM, 73–82.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 52: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. InProceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.

Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-book. Springer, 145–186.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommendersystems. Computer 8 (2009), 30–37.

Weiyang Lin, Sergio A Alvarez, and Carolina Ruiz. 2002. Efficient adaptive-support association rule miningfor recommender systems. Data mining and knowledge discovery 6, 1 (2002), 83–105.

Hao Ma. 2013. An experimental study on implicit social recommendation. In Proceedings of the 36th inter-national ACM SIGIR conference on Research and development in information retrieval. ACM, 73–82.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Pdf over parameters i.s.o. point estimation

[Koeningstein et al. 2012] [Paquet and Koeningstein 2013]

Page 53: Tutorial bpocf

Summary: 2 Basic Building Blocks

Factorization Model

Deviation Function

Page 54: Tutorial bpocf

Summary: 2 Basic Building Blocks

Factorization Model

Deviation Function

a.k.a. What do we minimize in order to find the parameters in the factor matrices?

Page 55: Tutorial bpocf

Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference

•  Netflix

Page 56: Tutorial bpocf

Tour of Deviation Functions

Page 57: Tutorial bpocf

Local Minima depending on initialisation

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 58: Tutorial bpocf

Max Likelihood high scores for known preferences

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

min D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

d = 2

d = 3

d = 4

D = |I|SS(1)

S(2)

S(1)ud � 0

S(1)di � 0

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

DX

d=1

S(1)ud = 1

X

i2IS(1)

di = 1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

d = 2

d = 3

d = 4

D = |I|SS(1)

S(2)

S(1)ud � 0

S(1)di � 0

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

DX

d=1

S(1)ud = 1

X

i2IS(1)

di = 1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

d = 2

d = 3

d = 4

D = |I|SS(1)

S(2)

S(1)ud � 0

S(1)di � 0

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

DX

d=1

S(1)ud = 1

X

i2IS(1)

di = 1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1

1

[Hofmann 2004] [Hofmann 1999]

Page 59: Tutorial bpocf

Reconstruction

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 60: Tutorial bpocf

Reconstruction

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

`Ridge’ regularization [Kabbur et al. 2013] [Kabbur and Karypis 2014]

Page 61: Tutorial bpocf

Reconstruction

Elastic net regularization

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

`Ridge’ regularization

[Ning and Karypis 2011] [Christakopoulou and Karypis 2014]

[Kabbur et al. 2013] [Kabbur and Karypis 2014]

Page 62: Tutorial bpocf

Reconstruction between AMAU and AMAN

1:6 K. Verstrepen et al.

dure [Hofmann 1999]. Ungar and Foster [Ungar and Foster 1998] proposed a similarhard clustering method, but remain vague about the details of their method.

4.1.2. Reconstruction Based Deviation Functions. Next, there is a group of algorithmsinspired by SVD-based matrix factorization algorithms for rating prediction prob-lems [Koren and Bell 2011]. They start from the 2-factor factorization that describesthe aspect model (Eq. 3) but strip the parameters of all their statistical meaning. In-stead, S is postulated to be an approximate, factorized reconstruction of R. A straight-forward approach is to find S(1) and S(2) such that they minimize the the squaredreconstruction error between S and R. A deviation function that reflects this line ofthough is

D (S,R) =

X

u2U

X

i2IRui (Rui � Sui)

2.

This approach clearly makes the AMAU assumption. Making the AMAN assumption,on the other hand, all missing values are interpreted as an absence of preference andthe deviation function becomes

D (S,R) =

X

u2U

X

i2I(Rui � Sui)

2.

On the one hand, the AMAU assumption is too careful because the vast majority of theunknowns are negatives. On the other hand, the AMAN assumption is too incautiousbecause we are actually searching for the preferences among the unknowns. Thereforeboth Hu et al. [Hu et al. 2008] and Pan et al. [Pan et al. 2008] simultaneously proposeda middle way between AMAU and AMAN:

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2, (11)

in which W 2 Rn⇥m assigns a weight to every value in R. The higher Wui, the higherthe confidence about Rui. There is a high confidence about the ones being preferencesand a lower confidence about the zeros being dislikes. To formalize this intuition, Huet al. [Hu et al. 2008] give two potential definitions of Wui:

Wui = 1 + �Rui, (12)Wui = 1 + ↵ log (1 + Rui/✏) , (13)

with ↵, �, ✏ hyperparameters. From the above definitions, it is clear that this methodis not limited to binary data, but works on positive-only data in general. We, however,are only interested in its usefulness for binary, positive-only data. Alternatively, Panet al. [Pan et al. 2008] propose Wui = 1 if Rui = 1 and give three possibilities for thecase when Rui = 0:

Wui = �, (14)

Wui = ↵X

j2IRuj , (15)

Wui = ↵ (n � c(i)) , (16)with � 2 [0, 1] a uniform hyperparameter and ↵ the hyperparameter such that Wui 1

for all pairs (u, i) for which Rui = 0. In the first case, all missing preferences get thesame weight. In the second case, a missing preference is more negative if the useralready has many preferences. In the third case, a missing preference is less negativeif the item is popular1.

1One could argue that this is counterintuitive

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

AMAN

Page 63: Tutorial bpocf

Reconstruction between AMAU and AMAN

1:6 K. Verstrepen et al.

dure [Hofmann 1999]. Ungar and Foster [Ungar and Foster 1998] proposed a similarhard clustering method, but remain vague about the details of their method.

4.1.2. Reconstruction Based Deviation Functions. Next, there is a group of algorithmsinspired by SVD-based matrix factorization algorithms for rating prediction prob-lems [Koren and Bell 2011]. They start from the 2-factor factorization that describesthe aspect model (Eq. 3) but strip the parameters of all their statistical meaning. In-stead, S is postulated to be an approximate, factorized reconstruction of R. A straight-forward approach is to find S(1) and S(2) such that they minimize the the squaredreconstruction error between S and R. A deviation function that reflects this line ofthough is

D (S,R) =

X

u2U

X

i2IRui (Rui � Sui)

2.

This approach clearly makes the AMAU assumption. Making the AMAN assumption,on the other hand, all missing values are interpreted as an absence of preference andthe deviation function becomes

D (S,R) =

X

u2U

X

i2I(Rui � Sui)

2.

On the one hand, the AMAU assumption is too careful because the vast majority of theunknowns are negatives. On the other hand, the AMAN assumption is too incautiousbecause we are actually searching for the preferences among the unknowns. Thereforeboth Hu et al. [Hu et al. 2008] and Pan et al. [Pan et al. 2008] simultaneously proposeda middle way between AMAU and AMAN:

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2, (11)

in which W 2 Rn⇥m assigns a weight to every value in R. The higher Wui, the higherthe confidence about Rui. There is a high confidence about the ones being preferencesand a lower confidence about the zeros being dislikes. To formalize this intuition, Huet al. [Hu et al. 2008] give two potential definitions of Wui:

Wui = 1 + �Rui, (12)Wui = 1 + ↵ log (1 + Rui/✏) , (13)

with ↵, �, ✏ hyperparameters. From the above definitions, it is clear that this methodis not limited to binary data, but works on positive-only data in general. We, however,are only interested in its usefulness for binary, positive-only data. Alternatively, Panet al. [Pan et al. 2008] propose Wui = 1 if Rui = 1 and give three possibilities for thecase when Rui = 0:

Wui = �, (14)

Wui = ↵X

j2IRuj , (15)

Wui = ↵ (n � c(i)) , (16)with � 2 [0, 1] a uniform hyperparameter and ↵ the hyperparameter such that Wui 1

for all pairs (u, i) for which Rui = 0. In the first case, all missing preferences get thesame weight. In the second case, a missing preference is more negative if the useralready has many preferences. In the third case, a missing preference is less negativeif the item is popular1.

1One could argue that this is counterintuitive

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:6 K. Verstrepen et al.

dure [Hofmann 1999]. Ungar and Foster [Ungar and Foster 1998] proposed a similarhard clustering method, but remain vague about the details of their method.

4.1.2. Reconstruction Based Deviation Functions. Next, there is a group of algorithmsinspired by SVD-based matrix factorization algorithms for rating prediction prob-lems [Koren and Bell 2011]. They start from the 2-factor factorization that describesthe aspect model (Eq. 3) but strip the parameters of all their statistical meaning. In-stead, S is postulated to be an approximate, factorized reconstruction of R. A straight-forward approach is to find S(1) and S(2) such that they minimize the the squaredreconstruction error between S and R. A deviation function that reflects this line ofthough is

D (S,R) =

X

u2U

X

i2IRui (Rui � Sui)

2.

This approach clearly makes the AMAU assumption. Making the AMAN assumption,on the other hand, all missing values are interpreted as an absence of preference andthe deviation function becomes

D (S,R) =

X

u2U

X

i2I(Rui � Sui)

2.

On the one hand, the AMAU assumption is too careful because the vast majority of theunknowns are negatives. On the other hand, the AMAN assumption is too incautiousbecause we are actually searching for the preferences among the unknowns. Thereforeboth Hu et al. [Hu et al. 2008] and Pan et al. [Pan et al. 2008] simultaneously proposeda middle way between AMAU and AMAN:

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2, (11)

in which W 2 Rn⇥m assigns a weight to every value in R. The higher Wui, the higherthe confidence about Rui. There is a high confidence about the ones being preferencesand a lower confidence about the zeros being dislikes. To formalize this intuition, Huet al. [Hu et al. 2008] give two potential definitions of Wui:

Wui = 1 + �Rui, (12)Wui = 1 + ↵ log (1 + Rui/✏) , (13)

with ↵, �, ✏ hyperparameters. From the above definitions, it is clear that this methodis not limited to binary data, but works on positive-only data in general. We, however,are only interested in its usefulness for binary, positive-only data. Alternatively, Panet al. [Pan et al. 2008] propose Wui = 1 if Rui = 1 and give three possibilities for thecase when Rui = 0:

Wui = �, (14)

Wui = ↵X

j2IRuj , (15)

Wui = ↵ (n � c(i)) , (16)with � 2 [0, 1] a uniform hyperparameter and ↵ the hyperparameter such that Wui 1

for all pairs (u, i) for which Rui = 0. In the first case, all missing preferences get thesame weight. In the second case, a missing preference is more negative if the useralready has many preferences. In the third case, a missing preference is less negativeif the item is popular1.

1One could argue that this is counterintuitive

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

AMAU

AMAN

Page 64: Tutorial bpocf

Reconstruction between AMAU and AMAN

1:6 K. Verstrepen et al.

dure [Hofmann 1999]. Ungar and Foster [Ungar and Foster 1998] proposed a similarhard clustering method, but remain vague about the details of their method.

4.1.2. Reconstruction Based Deviation Functions. Next, there is a group of algorithmsinspired by SVD-based matrix factorization algorithms for rating prediction prob-lems [Koren and Bell 2011]. They start from the 2-factor factorization that describesthe aspect model (Eq. 3) but strip the parameters of all their statistical meaning. In-stead, S is postulated to be an approximate, factorized reconstruction of R. A straight-forward approach is to find S(1) and S(2) such that they minimize the the squaredreconstruction error between S and R. A deviation function that reflects this line ofthough is

D (S,R) =

X

u2U

X

i2IRui (Rui � Sui)

2.

This approach clearly makes the AMAU assumption. Making the AMAN assumption,on the other hand, all missing values are interpreted as an absence of preference andthe deviation function becomes

D (S,R) =

X

u2U

X

i2I(Rui � Sui)

2.

On the one hand, the AMAU assumption is too careful because the vast majority of theunknowns are negatives. On the other hand, the AMAN assumption is too incautiousbecause we are actually searching for the preferences among the unknowns. Thereforeboth Hu et al. [Hu et al. 2008] and Pan et al. [Pan et al. 2008] simultaneously proposeda middle way between AMAU and AMAN:

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2, (11)

in which W 2 Rn⇥m assigns a weight to every value in R. The higher Wui, the higherthe confidence about Rui. There is a high confidence about the ones being preferencesand a lower confidence about the zeros being dislikes. To formalize this intuition, Huet al. [Hu et al. 2008] give two potential definitions of Wui:

Wui = 1 + �Rui, (12)Wui = 1 + ↵ log (1 + Rui/✏) , (13)

with ↵, �, ✏ hyperparameters. From the above definitions, it is clear that this methodis not limited to binary data, but works on positive-only data in general. We, however,are only interested in its usefulness for binary, positive-only data. Alternatively, Panet al. [Pan et al. 2008] propose Wui = 1 if Rui = 1 and give three possibilities for thecase when Rui = 0:

Wui = �, (14)

Wui = ↵X

j2IRuj , (15)

Wui = ↵ (n � c(i)) , (16)with � 2 [0, 1] a uniform hyperparameter and ↵ the hyperparameter such that Wui 1

for all pairs (u, i) for which Rui = 0. In the first case, all missing preferences get thesame weight. In the second case, a missing preference is more negative if the useralready has many preferences. In the third case, a missing preference is less negativeif the item is popular1.

1One could argue that this is counterintuitive

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:6 K. Verstrepen et al.

dure [Hofmann 1999]. Ungar and Foster [Ungar and Foster 1998] proposed a similarhard clustering method, but remain vague about the details of their method.

4.1.2. Reconstruction Based Deviation Functions. Next, there is a group of algorithmsinspired by SVD-based matrix factorization algorithms for rating prediction prob-lems [Koren and Bell 2011]. They start from the 2-factor factorization that describesthe aspect model (Eq. 3) but strip the parameters of all their statistical meaning. In-stead, S is postulated to be an approximate, factorized reconstruction of R. A straight-forward approach is to find S(1) and S(2) such that they minimize the the squaredreconstruction error between S and R. A deviation function that reflects this line ofthough is

D (S,R) =

X

u2U

X

i2IRui (Rui � Sui)

2.

This approach clearly makes the AMAU assumption. Making the AMAN assumption,on the other hand, all missing values are interpreted as an absence of preference andthe deviation function becomes

D (S,R) =

X

u2U

X

i2I(Rui � Sui)

2.

On the one hand, the AMAU assumption is too careful because the vast majority of theunknowns are negatives. On the other hand, the AMAN assumption is too incautiousbecause we are actually searching for the preferences among the unknowns. Thereforeboth Hu et al. [Hu et al. 2008] and Pan et al. [Pan et al. 2008] simultaneously proposeda middle way between AMAU and AMAN:

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2, (11)

in which W 2 Rn⇥m assigns a weight to every value in R. The higher Wui, the higherthe confidence about Rui. There is a high confidence about the ones being preferencesand a lower confidence about the zeros being dislikes. To formalize this intuition, Huet al. [Hu et al. 2008] give two potential definitions of Wui:

Wui = 1 + �Rui, (12)Wui = 1 + ↵ log (1 + Rui/✏) , (13)

with ↵, �, ✏ hyperparameters. From the above definitions, it is clear that this methodis not limited to binary data, but works on positive-only data in general. We, however,are only interested in its usefulness for binary, positive-only data. Alternatively, Panet al. [Pan et al. 2008] propose Wui = 1 if Rui = 1 and give three possibilities for thecase when Rui = 0:

Wui = �, (14)

Wui = ↵X

j2IRuj , (15)

Wui = ↵ (n � c(i)) , (16)with � 2 [0, 1] a uniform hyperparameter and ↵ the hyperparameter such that Wui 1

for all pairs (u, i) for which Rui = 0. In the first case, all missing preferences get thesame weight. In the second case, a missing preference is more negative if the useralready has many preferences. In the third case, a missing preference is less negativeif the item is popular1.

1One could argue that this is counterintuitive

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:6 K. Verstrepen et al.

dure [Hofmann 1999]. Ungar and Foster [Ungar and Foster 1998] proposed a similarhard clustering method, but remain vague about the details of their method.

4.1.2. Reconstruction Based Deviation Functions. Next, there is a group of algorithmsinspired by SVD-based matrix factorization algorithms for rating prediction prob-lems [Koren and Bell 2011]. They start from the 2-factor factorization that describesthe aspect model (Eq. 3) but strip the parameters of all their statistical meaning. In-stead, S is postulated to be an approximate, factorized reconstruction of R. A straight-forward approach is to find S(1) and S(2) such that they minimize the the squaredreconstruction error between S and R. A deviation function that reflects this line ofthough is

D (S,R) =

X

u2U

X

i2IRui (Rui � Sui)

2.

This approach clearly makes the AMAU assumption. Making the AMAN assumption,on the other hand, all missing values are interpreted as an absence of preference andthe deviation function becomes

D (S,R) =

X

u2U

X

i2I(Rui � Sui)

2.

On the one hand, the AMAU assumption is too careful because the vast majority of theunknowns are negatives. On the other hand, the AMAN assumption is too incautiousbecause we are actually searching for the preferences among the unknowns. Thereforeboth Hu et al. [Hu et al. 2008] and Pan et al. [Pan et al. 2008] simultaneously proposeda middle way between AMAU and AMAN:

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2, (11)

in which W 2 Rn⇥m assigns a weight to every value in R. The higher Wui, the higherthe confidence about Rui. There is a high confidence about the ones being preferencesand a lower confidence about the zeros being dislikes. To formalize this intuition, Huet al. [Hu et al. 2008] give two potential definitions of Wui:

Wui = 1 + �Rui, (12)Wui = 1 + ↵ log (1 + Rui/✏) , (13)

with ↵, �, ✏ hyperparameters. From the above definitions, it is clear that this methodis not limited to binary data, but works on positive-only data in general. We, however,are only interested in its usefulness for binary, positive-only data. Alternatively, Panet al. [Pan et al. 2008] propose Wui = 1 if Rui = 1 and give three possibilities for thecase when Rui = 0:

Wui = �, (14)

Wui = ↵X

j2IRuj , (15)

Wui = ↵ (n � c(i)) , (16)with � 2 [0, 1] a uniform hyperparameter and ↵ the hyperparameter such that Wui 1

for all pairs (u, i) for which Rui = 0. In the first case, all missing preferences get thesame weight. In the second case, a missing preference is more negative if the useralready has many preferences. In the third case, a missing preference is less negativeif the item is popular1.

1One could argue that this is counterintuitive

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

AMAU

AMAN

Middle Way

Page 65: Tutorial bpocf

Reconstruction choosing W

1:6 K. Verstrepen et al.

dure [Hofmann 1999]. Ungar and Foster [Ungar and Foster 1998] proposed a similarhard clustering method, but remain vague about the details of their method.

4.1.2. Reconstruction Based Deviation Functions. Next, there is a group of algorithmsinspired by SVD-based matrix factorization algorithms for rating prediction prob-lems [Koren and Bell 2011]. They start from the 2-factor factorization that describesthe aspect model (Eq. 3) but strip the parameters of all their statistical meaning. In-stead, S is postulated to be an approximate, factorized reconstruction of R. A straight-forward approach is to find S(1) and S(2) such that they minimize the the squaredreconstruction error between S and R. A deviation function that reflects this line ofthough is

D (S,R) =

X

u2U

X

i2IRui (Rui � Sui)

2.

This approach clearly makes the AMAU assumption. Making the AMAN assumption,on the other hand, all missing values are interpreted as an absence of preference andthe deviation function becomes

D (S,R) =

X

u2U

X

i2I(Rui � Sui)

2.

On the one hand, the AMAU assumption is too careful because the vast majority of theunknowns are negatives. On the other hand, the AMAN assumption is too incautiousbecause we are actually searching for the preferences among the unknowns. Thereforeboth Hu et al. [Hu et al. 2008] and Pan et al. [Pan et al. 2008] simultaneously proposeda middle way between AMAU and AMAN:

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2, (11)

in which W 2 Rn⇥m assigns a weight to every value in R. The higher Wui, the higherthe confidence about Rui. There is a high confidence about the ones being preferencesand a lower confidence about the zeros being dislikes. To formalize this intuition, Huet al. [Hu et al. 2008] give two potential definitions of Wui:

Wui = 1 + �Rui, (12)Wui = 1 + ↵ log (1 + Rui/✏) , (13)

with ↵, �, ✏ hyperparameters. From the above definitions, it is clear that this methodis not limited to binary data, but works on positive-only data in general. We, however,are only interested in its usefulness for binary, positive-only data. Alternatively, Panet al. [Pan et al. 2008] propose Wui = 1 if Rui = 1 and give three possibilities for thecase when Rui = 0:

Wui = �, (14)

Wui = ↵X

j2IRuj , (15)

Wui = ↵ (n � c(i)) , (16)with � 2 [0, 1] a uniform hyperparameter and ↵ the hyperparameter such that Wui 1

for all pairs (u, i) for which Rui = 0. In the first case, all missing preferences get thesame weight. In the second case, a missing preference is more negative if the useralready has many preferences. In the third case, a missing preference is less negativeif the item is popular1.

1One could argue that this is counterintuitive

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Middle Way

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

d = 2

d = 3

d = 4

D = |I|SS(1)

S(2)

S(1)ud � 0

S(1)di � 0

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

DX

d=1

S(1)ud = 1

X

i2IS(1)

di = 1

⇢Wui = 1 if Rui = 0

Wui = ↵ if Rui = 1

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 66: Tutorial bpocf

Reconstruction regularization Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:7

By stripping the matrix factorization of its statistical meaning, also the constraintsin Equations 6 to 9 disappear. Simply minimizing Equation 11 however results intofactor matrices that are overfitted on the training data. Therefore both Hu et al. andPan et al. propose to minimize a regularized version

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2+ �

⇣||S(1)||F + ||S(2)||F

⌘, (17)

with � 2 R+ the regularization hyperparameter and ||.||F the Frobenius norm. Thelarge domain of � can make it hard to find a good value. Additionally, Pan et al. alsopropose the alternate regularization:

D (S,R) =

X

u2U

X

i2IWui

⇣(Rui � Sui)

2+ �

⇣||S(1)

u⇤ ||F + ||S(2)⇤j ||F

⌘⌘. (18)

Since the deviation function is defined over all user-item pairs, a direct optimiza-tion method such as stochastic gradient descent (SGD), which is frequently used forfinding matrix factorizations in rating prediction problems, seems unfeasible in thiscase [Hu et al. 2008]. Therefore both Hu et al. and Pan et al. propose an alternatingleast squares (ALS) method for minimizing the deviation function. Additionally, Panet al. propose an alternative bagging method for solving the regularized minimizationproblem that is more scalable.

switch sindhwani and yao because yao is a simpler form and can be solved with ALS.If I do this, I must be carefull with chronology because sindhwani was before yao.

Sindhwani et al. propose a more complex weighting scheme [Sindhwani et al. 2010].Whereas the previous methods computed the weights of the user-item-pairs, W, beforethe optimization procedure, Sindhwani et al. consider these weights as model parame-ters and compute them simultaneously with all other parameters during the optimiza-tion procedure. Furthermore they introduce a new set of parameters P that indicatesfor every missing value the probability that it is one. Their deviation function is definedas

D (S,R) =

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) , (19)

with Pui the probability that the missing value corresponding to the user-item-pair(u, i) is actually a 1, Wui the confidence of the value Rui and ↵, �, � user-defined hy-perparameters. Furthermore, Sindhwani et al. define the constraint

1

|U||I| � |R|X

u2U

X

i2IPui = �,

i.e that the average probability that a missing value is actually one must be equalto the user-defined hyperparameter �. Additionally, they simplify W as the one-dimensional matrix factorization

Wui = VuVi.

The first term of the deviation function in Equation 19 gives the squared reconstruc-tion error on the ones and the second term gives the squared reconstruction error on

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Squared reconstruction error term Regularization term

Regularization hyperparameter

[Hu et al. 2008] [Pan et al. 2008]

[Pan and Scholz 2009]

Page 67: Tutorial bpocf

Reconstruction more complex

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:7

By stripping the matrix factorization of its statistical meaning, also the constraintsin Equations 6 to 9 disappear. Simply minimizing Equation 11 however results intofactor matrices that are overfitted on the training data. Therefore both Hu et al. andPan et al. propose to minimize a regularized version

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2+ �

⇣||S(1)||F + ||S(2)||F

⌘, (17)

with � 2 R+ the regularization hyperparameter and ||.||F the Frobenius norm. Thelarge domain of � can make it hard to find a good value. Additionally, Pan et al. alsopropose the alternate regularization:

D (S,R) =

X

u2U

X

i2IWui

⇣(Rui � Sui)

2+ �

⇣||S(1)

u⇤ ||F + ||S(2)⇤j ||F

⌘⌘. (18)

Since the deviation function is defined over all user-item pairs, a direct optimiza-tion method such as stochastic gradient descent (SGD), which is frequently used forfinding matrix factorizations in rating prediction problems, seems unfeasible in thiscase [Hu et al. 2008]. Therefore both Hu et al. and Pan et al. propose an alternatingleast squares (ALS) method for minimizing the deviation function. Additionally, Panet al. propose an alternative bagging method for solving the regularized minimizationproblem that is more scalable.

switch sindhwani and yao because yao is a simpler form and can be solved with ALS.If I do this, I must be carefull with chronology because sindhwani was before yao.

Sindhwani et al. propose a more complex weighting scheme [Sindhwani et al. 2010].Whereas the previous methods computed the weights of the user-item-pairs, W, beforethe optimization procedure, Sindhwani et al. consider these weights as model parame-ters and compute them simultaneously with all other parameters during the optimiza-tion procedure. Furthermore they introduce a new set of parameters P that indicatesfor every missing value the probability that it is one. Their deviation function is definedas

D (S,R) =

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) , (19)

with Pui the probability that the missing value corresponding to the user-item-pair(u, i) is actually a 1, Wui the confidence of the value Rui and ↵, �, � user-defined hy-perparameters. Furthermore, Sindhwani et al. define the constraint

1

|U||I| � |R|X

u2U

X

i2IPui = �,

i.e that the average probability that a missing value is actually one must be equalto the user-defined hyperparameter �. Additionally, they simplify W as the one-dimensional matrix factorization

Wui = VuVi.

The first term of the deviation function in Equation 19 gives the squared reconstruc-tion error on the ones and the second term gives the squared reconstruction error on

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 68: Tutorial bpocf

Reconstruction rewritten

1:32 K. Verstrepen et al.

D (S,R) =

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) , (54)

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui (0 � Sui)

2

+ �||S(1)||F + �||S(2)||F

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 69: Tutorial bpocf

Reconstruction rewritten

1:32 K. Verstrepen et al.

D (S,R) =

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) , (54)

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui (0 � Sui)

2

+ �||S(1)||F + �||S(2)||F

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 70: Tutorial bpocf

1:32 K. Verstrepen et al.

D (S,R) =

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) , (54)

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui (0 � Sui)

2

+ �||S(1)||F + �||S(2)||F

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Reconstruction guess unknown = 0

Page 71: Tutorial bpocf

1:32 K. Verstrepen et al.

D (S,R) =

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) , (54)

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui (0 � Sui)

2

+ �||S(1)||F + �||S(2)||F

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui (0 � Sui)

2

+ �||S(1)||F + �||S(2)||F

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣p (1 � Sui)

2+ (1 � p) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (55)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Reconstruction unknown can also be 1

[Yao et al. 2014]

Page 72: Tutorial bpocf

Reconstruction less assumptions, more parameters

1:32 K. Verstrepen et al.

D (S,R) =

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) , (54)

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui (0 � Sui)

2

+ �||S(1)||F + �||S(2)||F

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui (0 � Sui)

2

+ �||S(1)||F + �||S(2)||F

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 73: Tutorial bpocf

Reconstruction more regularization

1:32 K. Verstrepen et al.

D (S,R) =

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) , (54)

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui (0 � Sui)

2

+ �||S(1)||F + �||S(2)||F

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui (0 � Sui)

2

+ �||S(1)||F + �||S(2)||F

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (55)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 74: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (56)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.H. Ma. 2013. An Experimental Study on Implicit Social Recommendation. In SIGIR. 73–82.G.V. Menezes, J.M. Almeida, F.B. Belem, M.A. Goncalves, A. Lacerda, E. Silva de Moura, G.L. Pappa, A.

Veloso, and N. Ziviani. 2010. Demand Drive Tag Recommendation. In ECML/PKDD. 402–417.B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. 2001. Effective Personalization Based on Association Rule

Discovery from Web Usage Data. In WIDM. 9–15.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Reconstruction more (flexible) parameters

[Sindhwani et al. 2010]

Page 75: Tutorial bpocf

Reconstruction conceptual flaw

1:6 K. Verstrepen et al.

dure [Hofmann 1999]. Ungar and Foster [Ungar and Foster 1998] proposed a similarhard clustering method, but remain vague about the details of their method.

4.1.2. Reconstruction Based Deviation Functions. Next, there is a group of algorithmsinspired by SVD-based matrix factorization algorithms for rating prediction prob-lems [Koren and Bell 2011]. They start from the 2-factor factorization that describesthe aspect model (Eq. 3) but strip the parameters of all their statistical meaning. In-stead, S is postulated to be an approximate, factorized reconstruction of R. A straight-forward approach is to find S(1) and S(2) such that they minimize the the squaredreconstruction error between S and R. A deviation function that reflects this line ofthough is

D (S,R) =

X

u2U

X

i2IRui (Rui � Sui)

2.

This approach clearly makes the AMAU assumption. Making the AMAN assumption,on the other hand, all missing values are interpreted as an absence of preference andthe deviation function becomes

D (S,R) =

X

u2U

X

i2I(Rui � Sui)

2.

On the one hand, the AMAU assumption is too careful because the vast majority of theunknowns are negatives. On the other hand, the AMAN assumption is too incautiousbecause we are actually searching for the preferences among the unknowns. Thereforeboth Hu et al. [Hu et al. 2008] and Pan et al. [Pan et al. 2008] simultaneously proposeda middle way between AMAU and AMAN:

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2, (11)

in which W 2 Rn⇥m assigns a weight to every value in R. The higher Wui, the higherthe confidence about Rui. There is a high confidence about the ones being preferencesand a lower confidence about the zeros being dislikes. To formalize this intuition, Huet al. [Hu et al. 2008] give two potential definitions of Wui:

Wui = 1 + �Rui, (12)Wui = 1 + ↵ log (1 + Rui/✏) , (13)

with ↵, �, ✏ hyperparameters. From the above definitions, it is clear that this methodis not limited to binary data, but works on positive-only data in general. We, however,are only interested in its usefulness for binary, positive-only data. Alternatively, Panet al. [Pan et al. 2008] propose Wui = 1 if Rui = 1 and give three possibilities for thecase when Rui = 0:

Wui = �, (14)

Wui = ↵X

j2IRuj , (15)

Wui = ↵ (n � c(i)) , (16)with � 2 [0, 1] a uniform hyperparameter and ↵ the hyperparameter such that Wui 1

for all pairs (u, i) for which Rui = 0. In the first case, all missing preferences get thesame weight. In the second case, a missing preference is more negative if the useralready has many preferences. In the third case, a missing preference is less negativeif the item is popular1.

1One could argue that this is counterintuitive

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (56)

(1 � 0)

2= 1 = (1 � 2)

2 (57)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.H. Ma. 2013. An Experimental Study on Implicit Social Recommendation. In SIGIR. 73–82.G.V. Menezes, J.M. Almeida, F.B. Belem, M.A. Goncalves, A. Lacerda, E. Silva de Moura, G.L. Pappa, A.

Veloso, and N. Ziviani. 2010. Demand Drive Tag Recommendation. In ECML/PKDD. 402–417.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 76: Tutorial bpocf

Log likelihood similar idea

[C. Johnson 2014]

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log

⇣1 � Sui) + �(||S(1)||2F + ||S(2)||2F

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 77: Tutorial bpocf

Log likelihood similar idea

[C. Johnson 2014]

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log

⇣1 � Sui) + �(||S(1)||2F + ||S(2)||2F

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log(1 � Sui) + �

⇣||S(1)||2F + ||S(2)||2F

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Zero-­‐mean,  spherical  Gaussian  priors  

Page 78: Tutorial bpocf

Maximum Margin not all preferences equally preferred

1:8 K. Verstrepen et al.

the zeros. The third term gives the regularization error and the fourth and final term,�↵

Pu2U

Pi2I(1 � Rui)H (Pui), gives the entropy of the probabilities Pui and is in-

troduced for smoothing the deviation function. If ↵ is big, the entropy term dominatesand all Pui are found to minimize entropy, i.e. Pui = � for every (u, i). As ↵ is reduced,the entropy of the optimal P increases and P becomes less uniform. A conceptual in-consistency of Equation 19 is that although the recommendation score used is givenby Sui(= S(1)

u⇤ S(2)⇤i ), also Pui could be used. Hence, there exist two parameters for the

same concept, which is ambiguous at least.To compute a local minimum of of the deviation function in Equation 19, Sindhwani

et al. propose a custom non-convex optimization procedure. A serious limit on the scal-ability is that P is a dense matrix. Therefore they enforce sparseness on P by randomlychoosing a small number of user-item pairs (u, i) for which Pui can be bigger than zero.This random aspect weakens the conceptual argument behind the definition of Equa-tion 19.

Yao et al. propose a reformulation of Equation 19 without the entropy smoothingterm, i.e. with ↵ = 0, and with � = � [Yao et al. 2014]. Furthermore, two constraintson the parameters are different. Firstly, they choose a uniform weight for all missingfeedback (Equation 14) instead of including W as a parameter in the optimizationproblem. Similarly, they also uniformly choose Pui = p with the global imputationvalue p a hyperparameter of the method. Not surprisingly, they also propose a differentalgorithm for minimizing Equation 19.

4.1.3. Maximum Margin Based Deviation Functions. Notice that R is a binary matrix andthat for the above algorithms S is a real valued matrix. Therefore, the interpretationof S as the pure reconstruction of R is fundamentally flawed. This fundamental flawhas important practical consequences: If Rui = 1, the square loss is 1 for both Sui = 0

and Sui = 2. However, Sui = 2 is a much better prediction than Sui = 0. Put differently,the reconstruction based deviation functions (implicitly) assume that all preferencesare equally strong, which is an important simplification of reality.

A deviation function that does not suffer from this flaw was proposed by Pan andScholz [Pan and Scholz 2009], who applied the idea of Maximum Margin Matrix Fac-torization (MMMF) by Srebro et al. [Srebro et al. 2004] to binary, positive-only collab-orative filtering. They construct the matrix ˜R as

⇢˜Rui = 1 if Rui = 1

˜Rui = �1 if Rui = 0,

and define the deviation funtion as

D⇣S, ˜R

⌘=

X

u2U

X

i2IWuih

⇣˜Rui · Sui

⌘+ �||S||⌃, (20)

with ||.||⌃ the trace norm, � a regularization hyperparameter, h⇣

˜Rui · Sui

⌘a smooth

hinge loss given by Figure 3 [Rennie and Srebro 2005] and W given by one of theEquations 14-16.

The deviation function incorporates the confidence about the training data by meansof W and the missing knowledge about the degree of preference by means of the hingeloss h

⇣˜Rui · Sui

⌘. Since the degree of preference is considered unknown, a value ˜Rui �

1 is not penalized.Minimizing Equation 20 can be done by means of the conjugate gradients method

by Rennie and Srebro [Rennie and Srebro 2005]. Alternatively, Pan and Scholz [Pan

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Pan and Scholz 2009]

Page 79: Tutorial bpocf

Maximum Margin not all preferences equally preferred

1:8 K. Verstrepen et al.

the zeros. The third term gives the regularization error and the fourth and final term,�↵

Pu2U

Pi2I(1 � Rui)H (Pui), gives the entropy of the probabilities Pui and is in-

troduced for smoothing the deviation function. If ↵ is big, the entropy term dominatesand all Pui are found to minimize entropy, i.e. Pui = � for every (u, i). As ↵ is reduced,the entropy of the optimal P increases and P becomes less uniform. A conceptual in-consistency of Equation 19 is that although the recommendation score used is givenby Sui(= S(1)

u⇤ S(2)⇤i ), also Pui could be used. Hence, there exist two parameters for the

same concept, which is ambiguous at least.To compute a local minimum of of the deviation function in Equation 19, Sindhwani

et al. propose a custom non-convex optimization procedure. A serious limit on the scal-ability is that P is a dense matrix. Therefore they enforce sparseness on P by randomlychoosing a small number of user-item pairs (u, i) for which Pui can be bigger than zero.This random aspect weakens the conceptual argument behind the definition of Equa-tion 19.

Yao et al. propose a reformulation of Equation 19 without the entropy smoothingterm, i.e. with ↵ = 0, and with � = � [Yao et al. 2014]. Furthermore, two constraintson the parameters are different. Firstly, they choose a uniform weight for all missingfeedback (Equation 14) instead of including W as a parameter in the optimizationproblem. Similarly, they also uniformly choose Pui = p with the global imputationvalue p a hyperparameter of the method. Not surprisingly, they also propose a differentalgorithm for minimizing Equation 19.

4.1.3. Maximum Margin Based Deviation Functions. Notice that R is a binary matrix andthat for the above algorithms S is a real valued matrix. Therefore, the interpretationof S as the pure reconstruction of R is fundamentally flawed. This fundamental flawhas important practical consequences: If Rui = 1, the square loss is 1 for both Sui = 0

and Sui = 2. However, Sui = 2 is a much better prediction than Sui = 0. Put differently,the reconstruction based deviation functions (implicitly) assume that all preferencesare equally strong, which is an important simplification of reality.

A deviation function that does not suffer from this flaw was proposed by Pan andScholz [Pan and Scholz 2009], who applied the idea of Maximum Margin Matrix Fac-torization (MMMF) by Srebro et al. [Srebro et al. 2004] to binary, positive-only collab-orative filtering. They construct the matrix ˜R as

⇢˜Rui = 1 if Rui = 1

˜Rui = �1 if Rui = 0,

and define the deviation funtion as

D⇣S, ˜R

⌘=

X

u2U

X

i2IWuih

⇣˜Rui · Sui

⌘+ �||S||⌃, (20)

with ||.||⌃ the trace norm, � a regularization hyperparameter, h⇣

˜Rui · Sui

⌘a smooth

hinge loss given by Figure 3 [Rennie and Srebro 2005] and W given by one of theEquations 14-16.

The deviation function incorporates the confidence about the training data by meansof W and the missing knowledge about the degree of preference by means of the hingeloss h

⇣˜Rui · Sui

⌘. Since the degree of preference is considered unknown, a value ˜Rui �

1 is not penalized.Minimizing Equation 20 can be done by means of the conjugate gradients method

by Rennie and Srebro [Rennie and Srebro 2005]. Alternatively, Pan and Scholz [Pan

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Pan and Scholz 2009]

Page 80: Tutorial bpocf

Maximum Margin not all preferences equally preferred

1:8 K. Verstrepen et al.

the zeros. The third term gives the regularization error and the fourth and final term,�↵

Pu2U

Pi2I(1 � Rui)H (Pui), gives the entropy of the probabilities Pui and is in-

troduced for smoothing the deviation function. If ↵ is big, the entropy term dominatesand all Pui are found to minimize entropy, i.e. Pui = � for every (u, i). As ↵ is reduced,the entropy of the optimal P increases and P becomes less uniform. A conceptual in-consistency of Equation 19 is that although the recommendation score used is givenby Sui(= S(1)

u⇤ S(2)⇤i ), also Pui could be used. Hence, there exist two parameters for the

same concept, which is ambiguous at least.To compute a local minimum of of the deviation function in Equation 19, Sindhwani

et al. propose a custom non-convex optimization procedure. A serious limit on the scal-ability is that P is a dense matrix. Therefore they enforce sparseness on P by randomlychoosing a small number of user-item pairs (u, i) for which Pui can be bigger than zero.This random aspect weakens the conceptual argument behind the definition of Equa-tion 19.

Yao et al. propose a reformulation of Equation 19 without the entropy smoothingterm, i.e. with ↵ = 0, and with � = � [Yao et al. 2014]. Furthermore, two constraintson the parameters are different. Firstly, they choose a uniform weight for all missingfeedback (Equation 14) instead of including W as a parameter in the optimizationproblem. Similarly, they also uniformly choose Pui = p with the global imputationvalue p a hyperparameter of the method. Not surprisingly, they also propose a differentalgorithm for minimizing Equation 19.

4.1.3. Maximum Margin Based Deviation Functions. Notice that R is a binary matrix andthat for the above algorithms S is a real valued matrix. Therefore, the interpretationof S as the pure reconstruction of R is fundamentally flawed. This fundamental flawhas important practical consequences: If Rui = 1, the square loss is 1 for both Sui = 0

and Sui = 2. However, Sui = 2 is a much better prediction than Sui = 0. Put differently,the reconstruction based deviation functions (implicitly) assume that all preferencesare equally strong, which is an important simplification of reality.

A deviation function that does not suffer from this flaw was proposed by Pan andScholz [Pan and Scholz 2009], who applied the idea of Maximum Margin Matrix Fac-torization (MMMF) by Srebro et al. [Srebro et al. 2004] to binary, positive-only collab-orative filtering. They construct the matrix ˜R as

⇢˜Rui = 1 if Rui = 1

˜Rui = �1 if Rui = 0,

and define the deviation funtion as

D⇣S, ˜R

⌘=

X

u2U

X

i2IWuih

⇣˜Rui · Sui

⌘+ �||S||⌃, (20)

with ||.||⌃ the trace norm, � a regularization hyperparameter, h⇣

˜Rui · Sui

⌘a smooth

hinge loss given by Figure 3 [Rennie and Srebro 2005] and W given by one of theEquations 14-16.

The deviation function incorporates the confidence about the training data by meansof W and the missing knowledge about the degree of preference by means of the hingeloss h

⇣˜Rui · Sui

⌘. Since the degree of preference is considered unknown, a value ˜Rui �

1 is not penalized.Minimizing Equation 20 can be done by means of the conjugate gradients method

by Rennie and Srebro [Rennie and Srebro 2005]. Alternatively, Pan and Scholz [Pan

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Pan and Scholz 2009]

1:6 K. Verstrepen et al.

dure [Hofmann 1999]. Ungar and Foster [Ungar and Foster 1998] proposed a similarhard clustering method, but remain vague about the details of their method.

4.1.2. Reconstruction Based Deviation Functions. Next, there is a group of algorithmsinspired by SVD-based matrix factorization algorithms for rating prediction prob-lems [Koren and Bell 2011]. They start from the 2-factor factorization that describesthe aspect model (Eq. 3) but strip the parameters of all their statistical meaning. In-stead, S is postulated to be an approximate, factorized reconstruction of R. A straight-forward approach is to find S(1) and S(2) such that they minimize the the squaredreconstruction error between S and R. A deviation function that reflects this line ofthough is

D (S,R) =

X

u2U

X

i2IRui (Rui � Sui)

2.

This approach clearly makes the AMAU assumption. Making the AMAN assumption,on the other hand, all missing values are interpreted as an absence of preference andthe deviation function becomes

D (S,R) =

X

u2U

X

i2I(Rui � Sui)

2.

On the one hand, the AMAU assumption is too careful because the vast majority of theunknowns are negatives. On the other hand, the AMAN assumption is too incautiousbecause we are actually searching for the preferences among the unknowns. Thereforeboth Hu et al. [Hu et al. 2008] and Pan et al. [Pan et al. 2008] simultaneously proposeda middle way between AMAU and AMAN:

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2, (11)

in which W 2 Rn⇥m assigns a weight to every value in R. The higher Wui, the higherthe confidence about Rui. There is a high confidence about the ones being preferencesand a lower confidence about the zeros being dislikes. To formalize this intuition, Huet al. [Hu et al. 2008] give two potential definitions of Wui:

Wui = 1 + �Rui, (12)Wui = 1 + ↵ log (1 + Rui/✏) , (13)

with ↵, �, ✏ hyperparameters. From the above definitions, it is clear that this methodis not limited to binary data, but works on positive-only data in general. We, however,are only interested in its usefulness for binary, positive-only data. Alternatively, Panet al. [Pan et al. 2008] propose Wui = 1 if Rui = 1 and give three possibilities for thecase when Rui = 0:

Wui = �, (14)

Wui = ↵X

j2IRuj , (15)

Wui = ↵ (n � c(i)) , (16)with � 2 [0, 1] a uniform hyperparameter and ↵ the hyperparameter such that Wui 1

for all pairs (u, i) for which Rui = 0. In the first case, all missing preferences get thesame weight. In the second case, a missing preference is more negative if the useralready has many preferences. In the third case, a missing preference is less negativeif the item is popular1.

1One could argue that this is counterintuitive

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:8 K. Verstrepen et al.

the zeros. The third term gives the regularization error and the fourth and final term,�↵

Pu2U

Pi2I(1 � Rui)H (Pui), gives the entropy of the probabilities Pui and is in-

troduced for smoothing the deviation function. If ↵ is big, the entropy term dominatesand all Pui are found to minimize entropy, i.e. Pui = � for every (u, i). As ↵ is reduced,the entropy of the optimal P increases and P becomes less uniform. A conceptual in-consistency of Equation 19 is that although the recommendation score used is givenby Sui(= S(1)

u⇤ S(2)⇤i ), also Pui could be used. Hence, there exist two parameters for the

same concept, which is ambiguous at least.To compute a local minimum of of the deviation function in Equation 19, Sindhwani

et al. propose a custom non-convex optimization procedure. A serious limit on the scal-ability is that P is a dense matrix. Therefore they enforce sparseness on P by randomlychoosing a small number of user-item pairs (u, i) for which Pui can be bigger than zero.This random aspect weakens the conceptual argument behind the definition of Equa-tion 19.

Yao et al. propose a reformulation of Equation 19 without the entropy smoothingterm, i.e. with ↵ = 0, and with � = � [Yao et al. 2014]. Furthermore, two constraintson the parameters are different. Firstly, they choose a uniform weight for all missingfeedback (Equation 14) instead of including W as a parameter in the optimizationproblem. Similarly, they also uniformly choose Pui = p with the global imputationvalue p a hyperparameter of the method. Not surprisingly, they also propose a differentalgorithm for minimizing Equation 19.

4.1.3. Maximum Margin Based Deviation Functions. Notice that R is a binary matrix andthat for the above algorithms S is a real valued matrix. Therefore, the interpretationof S as the pure reconstruction of R is fundamentally flawed. This fundamental flawhas important practical consequences: If Rui = 1, the square loss is 1 for both Sui = 0

and Sui = 2. However, Sui = 2 is a much better prediction than Sui = 0. Put differently,the reconstruction based deviation functions (implicitly) assume that all preferencesare equally strong, which is an important simplification of reality.

A deviation function that does not suffer from this flaw was proposed by Pan andScholz [Pan and Scholz 2009], who applied the idea of Maximum Margin Matrix Fac-torization (MMMF) by Srebro et al. [Srebro et al. 2004] to binary, positive-only collab-orative filtering. They construct the matrix ˜R as

⇢˜Rui = 1 if Rui = 1

˜Rui = �1 if Rui = 0,

and define the deviation funtion as

D⇣S, ˜R

⌘=

X

u2U

X

i2IWuih

⇣˜Rui · Sui

⌘+ �||S||⌃, (20)

with ||.||⌃ the trace norm, � a regularization hyperparameter, h⇣

˜Rui · Sui

⌘a smooth

hinge loss given by Figure 3 [Rennie and Srebro 2005] and W given by one of theEquations 14-16.

The deviation function incorporates the confidence about the training data by meansof W and the missing knowledge about the degree of preference by means of the hingeloss h

⇣˜Rui · Sui

⌘. Since the degree of preference is considered unknown, a value ˜Rui �

1 is not penalized.Minimizing Equation 20 can be done by means of the conjugate gradients method

by Rennie and Srebro [Rennie and Srebro 2005]. Alternatively, Pan and Scholz [Pan

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:8 K. Verstrepen et al.

the zeros. The third term gives the regularization error and the fourth and final term,�↵

Pu2U

Pi2I(1 � Rui)H (Pui), gives the entropy of the probabilities Pui and is in-

troduced for smoothing the deviation function. If ↵ is big, the entropy term dominatesand all Pui are found to minimize entropy, i.e. Pui = � for every (u, i). As ↵ is reduced,the entropy of the optimal P increases and P becomes less uniform. A conceptual in-consistency of Equation 19 is that although the recommendation score used is givenby Sui(= S(1)

u⇤ S(2)⇤i ), also Pui could be used. Hence, there exist two parameters for the

same concept, which is ambiguous at least.To compute a local minimum of of the deviation function in Equation 19, Sindhwani

et al. propose a custom non-convex optimization procedure. A serious limit on the scal-ability is that P is a dense matrix. Therefore they enforce sparseness on P by randomlychoosing a small number of user-item pairs (u, i) for which Pui can be bigger than zero.This random aspect weakens the conceptual argument behind the definition of Equa-tion 19.

Yao et al. propose a reformulation of Equation 19 without the entropy smoothingterm, i.e. with ↵ = 0, and with � = � [Yao et al. 2014]. Furthermore, two constraintson the parameters are different. Firstly, they choose a uniform weight for all missingfeedback (Equation 14) instead of including W as a parameter in the optimizationproblem. Similarly, they also uniformly choose Pui = p with the global imputationvalue p a hyperparameter of the method. Not surprisingly, they also propose a differentalgorithm for minimizing Equation 19.

4.1.3. Maximum Margin Based Deviation Functions. Notice that R is a binary matrix andthat for the above algorithms S is a real valued matrix. Therefore, the interpretationof S as the pure reconstruction of R is fundamentally flawed. This fundamental flawhas important practical consequences: If Rui = 1, the square loss is 1 for both Sui = 0

and Sui = 2. However, Sui = 2 is a much better prediction than Sui = 0. Put differently,the reconstruction based deviation functions (implicitly) assume that all preferencesare equally strong, which is an important simplification of reality.

A deviation function that does not suffer from this flaw was proposed by Pan andScholz [Pan and Scholz 2009], who applied the idea of Maximum Margin Matrix Fac-torization (MMMF) by Srebro et al. [Srebro et al. 2004] to binary, positive-only collab-orative filtering. They construct the matrix ˜R as

⇢˜Rui = 1 if Rui = 1

˜Rui = �1 if Rui = 0,

and define the deviation funtion as

D⇣S, ˜R

⌘=

X

u2U

X

i2IWuih

⇣˜Rui · Sui

⌘+ �||S||⌃, (20)

with ||.||⌃ the trace norm, � a regularization hyperparameter, h⇣

˜Rui · Sui

⌘a smooth

hinge loss given by Figure 3 [Rennie and Srebro 2005] and W given by one of theEquations 14-16.

The deviation function incorporates the confidence about the training data by meansof W and the missing knowledge about the degree of preference by means of the hingeloss h

⇣˜Rui · Sui

⌘. Since the degree of preference is considered unknown, a value ˜Rui �

1 is not penalized.Minimizing Equation 20 can be done by means of the conjugate gradients method

by Rennie and Srebro [Rennie and Srebro 2005]. Alternatively, Pan and Scholz [Pan

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:36 K. Verstrepen et al.

p(i|d1)

p(i|d2)

p(i|dD)

Sui = 0

Sui = 2

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:36 K. Verstrepen et al.

p(i|d1)

p(i|d2)

p(i|dD)

Sui = 0

Sui = 2

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1

1

~ 0.5

0

Page 81: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

AUC directly optimize the ranking

[Rendle et al. 2009]

Page 82: Tutorial bpocf

AUC directly optimize the ranking

[Rendle et al. 2009]

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 83: Tutorial bpocf

AUC non-differentiable

[Rendle et al. 2009]

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 84: Tutorial bpocf

AUC smooth approximation

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Rendle et al. 2009]

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 85: Tutorial bpocf

Pairwise Ranking 2 similar to AUC

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

D⇣S, ˜R

⌘=

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Kabbur et al. 2013]

Page 86: Tutorial bpocf

Pairwise Ranking 3 no regularization, also 1 to 1

1:10 K. Verstrepen et al.

in which r>(a | B) gives the rank of a among all numbers in B when ordered in de-scending order. Unfortunately, the non-smoothness of r>() and max makes the directoptimization of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR.Although this smoothed version differentiable, it could still be practically intractableto optimize it. Therefore, they propose to optimize a lower bound instead. After alsoadding regularization terms, their final deviation function is given by

D⇣S, ˜R

⌘= �

X

u2U

X

i2IRui

⇣log �(Sui)

+

X

j2Ilog (1 � Ruj�(Suj � Sui))

+ �⇣||S(1)||2F + ||S(2)||2F

⌘, (21)

with � a regularization constant and �() the sigmoid function. Notice that this devi-ation function de facto ignores all missing feedback, i.e. it corresponds to the AMAUassumption.

Yet another ranking based deviation function was proposed by Takacs andTikk [Takacs and Tikk 2012]

D⇣S, ˜R

⌘=

X

u2U

X

i2IRui

X

j2Iw(j) ((Sui � Suj) � (Rui � Ruj))

2,

with w() a user-defined item weighting function. The simplest choice is w(j) = 1 forall j. An alternative proposed by Takacs and Tikk is w(j) =

Pu2U Ruj . This deviation

function has some resemblance with the one in Equation 4.1.4. However, a squaredloss is used instead of the log-loss of the sigmoid. Furthermore, this deviation functionalso minimizes the score-difference between all known preferences, which is not doneby Equation 4.1.4. Finally, it is remarkable that Takacs and Tikk explicitly do not adda regularization term, whereas most other authors find that the regularization term isimportant for their models performance.

4.1.5. Posterior Probability Deviation Functions. At this point, we almost finished dis-cussing the first group of algorithms, those for which all factor matrices are a priori un-known, and we hope it is becoming clear that the vast majority of algorithms nicely fitsin our framework that models recommendation scores as matrix factorizations foundby minimizing a deviation function. However, since we have chosen to tightly fit ourframework around the majority of the algorithms, there are a few algorithms thatdo not fit completely within it. Fortunately, these outlier algorithms are rare and theframework allows us to show exactly how they differ from the majority of algorithmsfor BPOCF.

A first outlier algorithm, by Koeningstein et al. [Koeningstein et al. 2012], computesthe eventual recommendation scores S as the expected value of the stochastic recom-mendation scores ˆS:

S = EhˆS|R

i, (22)

which can also be written as

S ⇡Z

p⇣

ˆS|R⌘

· ˆS · dˆS,

with p⇣

ˆS|R⌘

the posterior probability density function of the stochastic recommen-dation scores given the data. In the spirit of our framework, we define the deviation

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Takàcs and Tikk 2012]

Page 87: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

MRR focus on top of the ranking

[Shi et al. 2012]

Page 88: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

MRR non-differentiable

[Shi et al. 2012]

Page 89: Tutorial bpocf

1:10 K. Verstrepen et al.

in which r>(a | B) gives the rank of a among all numbers in B when ordered in de-scending order. Unfortunately, the non-smoothness of r>() and max makes the directoptimization of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR.Although this smoothed version differentiable, it could still be practically intractableto optimize it. Therefore, they propose to optimize a lower bound instead. After alsoadding regularization terms, their final deviation function is given by

D⇣S, ˜R

⌘= �

X

u2U

X

i2IRui

⇣log �(Sui)

+

X

j2Ilog (1 � Ruj�(Suj � Sui))

+ �⇣||S(1)||2F + ||S(2)||2F

⌘, (21)

with � a regularization constant and �() the sigmoid function. Notice that this devi-ation function de facto ignores all missing feedback, i.e. it corresponds to the AMAUassumption.

Yet another ranking based deviation function was proposed by Takacs andTikk [Takacs and Tikk 2012]

D⇣S, ˜R

⌘=

X

u2U

X

i2IRui

X

j2Iw(j) ((Sui � Suj) � (Rui � Ruj))

2,

with w() a user-defined item weighting function. The simplest choice is w(j) = 1 forall j. An alternative proposed by Takacs and Tikk is w(j) =

Pu2U Ruj . This deviation

function has some resemblance with the one in Equation 4.1.4. However, a squaredloss is used instead of the log-loss of the sigmoid. Furthermore, this deviation functionalso minimizes the score-difference between all known preferences, which is not doneby Equation 4.1.4. Finally, it is remarkable that Takacs and Tikk explicitly do not adda regularization term, whereas most other authors find that the regularization term isimportant for their models performance.

4.1.5. Posterior Probability Deviation Functions. At this point, we almost finished dis-cussing the first group of algorithms, those for which all factor matrices are a priori un-known, and we hope it is becoming clear that the vast majority of algorithms nicely fitsin our framework that models recommendation scores as matrix factorizations foundby minimizing a deviation function. However, since we have chosen to tightly fit ourframework around the majority of the algorithms, there are a few algorithms thatdo not fit completely within it. Fortunately, these outlier algorithms are rare and theframework allows us to show exactly how they differ from the majority of algorithmsfor BPOCF.

A first outlier algorithm, by Koeningstein et al. [Koeningstein et al. 2012], computesthe eventual recommendation scores S as the expected value of the stochastic recom-mendation scores ˆS:

S = EhˆS|R

i, (22)

which can also be written as

S ⇡Z

p⇣

ˆS|R⌘

· ˆS · dˆS,

with p⇣

ˆS|R⌘

the posterior probability density function of the stochastic recommen-dation scores given the data. In the spirit of our framework, we define the deviation

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

MRR differentiable approximation, computationally feasible

[Shi et al. 2012]

Page 90: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

MRR known preferences score high

promote

1:10 K. Verstrepen et al.

in which r>(a | B) gives the rank of a among all numbers in B when ordered in de-scending order. Unfortunately, the non-smoothness of r>() and max makes the directoptimization of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR.Although this smoothed version differentiable, it could still be practically intractableto optimize it. Therefore, they propose to optimize a lower bound instead. After alsoadding regularization terms, their final deviation function is given by

D⇣S, ˜R

⌘= �

X

u2U

X

i2IRui

⇣log �(Sui)

+

X

j2Ilog (1 � Ruj�(Suj � Sui))

+ �⇣||S(1)||2F + ||S(2)||2F

⌘, (21)

with � a regularization constant and �() the sigmoid function. Notice that this devi-ation function de facto ignores all missing feedback, i.e. it corresponds to the AMAUassumption.

Yet another ranking based deviation function was proposed by Takacs andTikk [Takacs and Tikk 2012]

D⇣S, ˜R

⌘=

X

u2U

X

i2IRui

X

j2Iw(j) ((Sui � Suj) � (Rui � Ruj))

2,

with w() a user-defined item weighting function. The simplest choice is w(j) = 1 forall j. An alternative proposed by Takacs and Tikk is w(j) =

Pu2U Ruj . This deviation

function has some resemblance with the one in Equation 4.1.4. However, a squaredloss is used instead of the log-loss of the sigmoid. Furthermore, this deviation functionalso minimizes the score-difference between all known preferences, which is not doneby Equation 4.1.4. Finally, it is remarkable that Takacs and Tikk explicitly do not adda regularization term, whereas most other authors find that the regularization term isimportant for their models performance.

4.1.5. Posterior Probability Deviation Functions. At this point, we almost finished dis-cussing the first group of algorithms, those for which all factor matrices are a priori un-known, and we hope it is becoming clear that the vast majority of algorithms nicely fitsin our framework that models recommendation scores as matrix factorizations foundby minimizing a deviation function. However, since we have chosen to tightly fit ourframework around the majority of the algorithms, there are a few algorithms thatdo not fit completely within it. Fortunately, these outlier algorithms are rare and theframework allows us to show exactly how they differ from the majority of algorithmsfor BPOCF.

A first outlier algorithm, by Koeningstein et al. [Koeningstein et al. 2012], computesthe eventual recommendation scores S as the expected value of the stochastic recom-mendation scores ˆS:

S = EhˆS|R

i, (22)

which can also be written as

S ⇡Z

p⇣

ˆS|R⌘

· ˆS · dˆS,

with p⇣

ˆS|R⌘

the posterior probability density function of the stochastic recommen-dation scores given the data. In the spirit of our framework, we define the deviation

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Shi et al. 2012]

Page 91: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

MRR push down other known preferences 1:10 K. Verstrepen et al.

in which r>(a | B) gives the rank of a among all numbers in B when ordered in de-scending order. Unfortunately, the non-smoothness of r>() and max makes the directoptimization of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR.Although this smoothed version differentiable, it could still be practically intractableto optimize it. Therefore, they propose to optimize a lower bound instead. After alsoadding regularization terms, their final deviation function is given by

D⇣S, ˜R

⌘= �

X

u2U

X

i2IRui

⇣log �(Sui)

+

X

j2Ilog (1 � Ruj�(Suj � Sui))

+ �⇣||S(1)||2F + ||S(2)||2F

⌘, (21)

with � a regularization constant and �() the sigmoid function. Notice that this devi-ation function de facto ignores all missing feedback, i.e. it corresponds to the AMAUassumption.

Yet another ranking based deviation function was proposed by Takacs andTikk [Takacs and Tikk 2012]

D⇣S, ˜R

⌘=

X

u2U

X

i2IRui

X

j2Iw(j) ((Sui � Suj) � (Rui � Ruj))

2,

with w() a user-defined item weighting function. The simplest choice is w(j) = 1 forall j. An alternative proposed by Takacs and Tikk is w(j) =

Pu2U Ruj . This deviation

function has some resemblance with the one in Equation 4.1.4. However, a squaredloss is used instead of the log-loss of the sigmoid. Furthermore, this deviation functionalso minimizes the score-difference between all known preferences, which is not doneby Equation 4.1.4. Finally, it is remarkable that Takacs and Tikk explicitly do not adda regularization term, whereas most other authors find that the regularization term isimportant for their models performance.

4.1.5. Posterior Probability Deviation Functions. At this point, we almost finished dis-cussing the first group of algorithms, those for which all factor matrices are a priori un-known, and we hope it is becoming clear that the vast majority of algorithms nicely fitsin our framework that models recommendation scores as matrix factorizations foundby minimizing a deviation function. However, since we have chosen to tightly fit ourframework around the majority of the algorithms, there are a few algorithms thatdo not fit completely within it. Fortunately, these outlier algorithms are rare and theframework allows us to show exactly how they differ from the majority of algorithmsfor BPOCF.

A first outlier algorithm, by Koeningstein et al. [Koeningstein et al. 2012], computesthe eventual recommendation scores S as the expected value of the stochastic recom-mendation scores ˆS:

S = EhˆS|R

i, (22)

which can also be written as

S ⇡Z

p⇣

ˆS|R⌘

· ˆS · dˆS,

with p⇣

ˆS|R⌘

the posterior probability density function of the stochastic recommen-dation scores given the data. In the spirit of our framework, we define the deviation

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

promote

scatter

[Shi et al. 2012]

Page 92: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

MRR corresponds to AMAU assumption 1:10 K. Verstrepen et al.

in which r>(a | B) gives the rank of a among all numbers in B when ordered in de-scending order. Unfortunately, the non-smoothness of r>() and max makes the directoptimization of MRR unfeasible. Hence, Shi et al. derive a smoothed version of MRR.Although this smoothed version differentiable, it could still be practically intractableto optimize it. Therefore, they propose to optimize a lower bound instead. After alsoadding regularization terms, their final deviation function is given by

D⇣S, ˜R

⌘= �

X

u2U

X

i2IRui

⇣log �(Sui)

+

X

j2Ilog (1 � Ruj�(Suj � Sui))

+ �⇣||S(1)||2F + ||S(2)||2F

⌘, (21)

with � a regularization constant and �() the sigmoid function. Notice that this devi-ation function de facto ignores all missing feedback, i.e. it corresponds to the AMAUassumption.

Yet another ranking based deviation function was proposed by Takacs andTikk [Takacs and Tikk 2012]

D⇣S, ˜R

⌘=

X

u2U

X

i2IRui

X

j2Iw(j) ((Sui � Suj) � (Rui � Ruj))

2,

with w() a user-defined item weighting function. The simplest choice is w(j) = 1 forall j. An alternative proposed by Takacs and Tikk is w(j) =

Pu2U Ruj . This deviation

function has some resemblance with the one in Equation 4.1.4. However, a squaredloss is used instead of the log-loss of the sigmoid. Furthermore, this deviation functionalso minimizes the score-difference between all known preferences, which is not doneby Equation 4.1.4. Finally, it is remarkable that Takacs and Tikk explicitly do not adda regularization term, whereas most other authors find that the regularization term isimportant for their models performance.

4.1.5. Posterior Probability Deviation Functions. At this point, we almost finished dis-cussing the first group of algorithms, those for which all factor matrices are a priori un-known, and we hope it is becoming clear that the vast majority of algorithms nicely fitsin our framework that models recommendation scores as matrix factorizations foundby minimizing a deviation function. However, since we have chosen to tightly fit ourframework around the majority of the algorithms, there are a few algorithms thatdo not fit completely within it. Fortunately, these outlier algorithms are rare and theframework allows us to show exactly how they differ from the majority of algorithmsfor BPOCF.

A first outlier algorithm, by Koeningstein et al. [Koeningstein et al. 2012], computesthe eventual recommendation scores S as the expected value of the stochastic recom-mendation scores ˆS:

S = EhˆS|R

i, (22)

which can also be written as

S ⇡Z

p⇣

ˆS|R⌘

· ˆS · dˆS,

with p⇣

ˆS|R⌘

the posterior probability density function of the stochastic recommen-dation scores given the data. In the spirit of our framework, we define the deviation

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

promote

scatter AMAU

[Shi et al. 2012]

Page 93: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

kth-Order Statistic basis = AUC

[Weston et al. 2013]

Page 94: Tutorial bpocf

kth-Order Statistic strip normalization

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Weston et al. 2013]

Page 95: Tutorial bpocf

kth-Order Statistic focus on highly ranked negatives

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Weston et al. 2013]

Page 96: Tutorial bpocf

kth-Order Statistic weight known preferences by rank

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

w

✓r>(Sui | {Sui | Rui = 1})

|u|

◆ X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(60)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Weston et al. 2013]

Page 97: Tutorial bpocf

kth-Order Statistic non-differentiable

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

w

✓r>(Sui | {Sui | Rui = 1})

|u|

◆ X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(60)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Weston et al. 2013]

Page 98: Tutorial bpocf

kth-Order Statistic hinge loss & sampling approximations

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

X

u2U

X

Rui=1

X

Ruj=0

�(Sui > Suj)

X

u2U

X

Rui=1

w

✓r>(Sui | {Sui | Rui = 1})

|u|

◆ X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(59)

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(60)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:15

ferentiable approximation

D⇣S, ˜R

⌘=

X

u2U

X

Rui=1

w

✓r>(Sui | {Sui | Rui = 1})

|u|

◆ X

Ruj=0

max(0, 1 + Suj � Sui)

N�1|{j 2 I | Ruj = 0}| ,

(31)in which they replaced the indicator function by the hinge-loss and approximated therank with N�1|{j 2 I | Ruj = 0}|, in which N the number of items k that wererandomly sampled until Suk + 1 > Sui

2. Furthermore, Weston et al. use the simpleweighting function

8<

:w

⇣r>(Sui|{Sui|Rui=1})

|u|

⌘= 1 if r>(Sui | S ✓ {Sui | Rui = 1}, |S| = K) = k and

w⇣

r>(Sui|{Sui|Rui=1})|u|

⌘= 0 otherwise ,

i.e. from the set S of K randomly sampled known preferences, ordered by their pre-dicted score, only the item at rank k is selected to contribute to the training error.When k is set low, the top of the ranking will be optimized at the cost of a worse meanrank. When k is set higher, the mean rank will be optimized at the cost of e.g. recallat 1 or MRR. The regularization is not done by adding a regularization term but byforcing the norm of the factor matrices to be below a maximum. Alternatively, Westonet al. also propose a simplified version

D⇣S, ˜R

⌘=

X

u2U

X

Rui=1

w

✓r>(Sui | {Sui | Rui = 1})

|u|

◆ X

Ruj=0

max(0, 1 + Suj � Sui). (32)

To the best of our knowledge, nobody proposed a reconstruction based deviation func-tion for this model yet.

4.4. Group 4: Three Factor Matrices, Two Factor Matrices A Priori Unknown, Bias TermsKabbur et al. [Kabbur et al. 2013] propose FISM (factored item similarity matrix fac-torization), a fine-tuned version of the 3-factor matrix factorization:

Sui = Uu + Ii +

�|u|�↵R

�S(2)S(3), (33)

with U the user-bias vector, I the item-bias vector and ↵ a hyperparameter between0 and 1. Besides the 3-factor matrix factorization, also the introduction of the user-and item-biasses for bpo collaborative filtering sets this model apart. Notice that forcomputing top-N recommendations for a user u, the user-bias Uu is not important.

However, when trained, the above model results in trivial solutions for for S(2) andS(3) that correspond to an item being similar itself and dissimilar to all other itmes.This can be more easily understood by rewriting Equation 33 as

Sui = Uu + Ii + |u|�↵X

Ruj=1

S(2)j⇤ S(3)

⇤i .

Now, to avoid these trivial solutions for S(2) and S(3), Kabbur et al. further enhancethe model to:

Sui = Uu + Ii + (|u| � Rui)�↵

X

Ruj=1

(1 � �(j = i)) · S(2)j⇤ S(3)

⇤i ,

2Weston et al. [Weston et al. 2011] provide a justification for this approximation

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:15

ferentiable approximation

D⇣S, ˜R

⌘=

X

u2U

X

Rui=1

w

✓r>(Sui | {Sui | Rui = 1})

|u|

◆ X

Ruj=0

max(0, 1 + Suj � Sui)

N�1|{j 2 I | Ruj = 0}| ,

(31)in which they replaced the indicator function by the hinge-loss and approximated therank with N�1|{j 2 I | Ruj = 0}|, in which N the number of items k that wererandomly sampled until Suk + 1 > Sui

2. Furthermore, Weston et al. use the simpleweighting function

8<

:w

⇣r>(Sui|{Sui|Rui=1})

|u|

⌘= 1 if r>(Sui | S ✓ {Sui | Rui = 1}, |S| = K) = k and

w⇣

r>(Sui|{Sui|Rui=1})|u|

⌘= 0 otherwise ,

i.e. from the set S of K randomly sampled known preferences, ordered by their pre-dicted score, only the item at rank k is selected to contribute to the training error.When k is set low, the top of the ranking will be optimized at the cost of a worse meanrank. When k is set higher, the mean rank will be optimized at the cost of e.g. recallat 1 or MRR. The regularization is not done by adding a regularization term but byforcing the norm of the factor matrices to be below a maximum. Alternatively, Westonet al. also propose a simplified version

D⇣S, ˜R

⌘=

X

u2U

X

Rui=1

w

✓r>(Sui | {Sui | Rui = 1})

|u|

◆ X

Ruj=0

max(0, 1 + Suj � Sui). (32)

To the best of our knowledge, nobody proposed a reconstruction based deviation func-tion for this model yet.

4.4. Group 4: Three Factor Matrices, Two Factor Matrices A Priori Unknown, Bias TermsKabbur et al. [Kabbur et al. 2013] propose FISM (factored item similarity matrix fac-torization), a fine-tuned version of the 3-factor matrix factorization:

Sui = Uu + Ii +

�|u|�↵R

�S(2)S(3), (33)

with U the user-bias vector, I the item-bias vector and ↵ a hyperparameter between0 and 1. Besides the 3-factor matrix factorization, also the introduction of the user-and item-biasses for bpo collaborative filtering sets this model apart. Notice that forcomputing top-N recommendations for a user u, the user-bias Uu is not important.

However, when trained, the above model results in trivial solutions for for S(2) andS(3) that correspond to an item being similar itself and dissimilar to all other itmes.This can be more easily understood by rewriting Equation 33 as

Sui = Uu + Ii + |u|�↵X

Ruj=1

S(2)j⇤ S(3)

⇤i .

Now, to avoid these trivial solutions for S(2) and S(3), Kabbur et al. further enhancethe model to:

Sui = Uu + Ii + (|u| � Rui)�↵

X

Ruj=1

(1 � �(j = i)) · S(2)j⇤ S(3)

⇤i ,

2Weston et al. [Weston et al. 2011] provide a justification for this approximation

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:14 K. Verstrepen et al.

pairs that are difficult to rank correctly. Therefore, Aiolli proposes to replace the uni-form weighting with a weighting scheme that minimizes the total margin. Specifically,he propose to solve for every user u, the joint optimization problem

S(1)u⇤ = arg max

S(1)u

min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj),

where for every user u, it holds thatP

Rui=1 ↵ui = 1 andP

Rui=0 ↵ui = 1. To avoidoverfitting of ↵, he adds two regularization terms:

S(1)u⇤ = arg max

S(1)u

min

↵u⇤

0

@X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj) + �p

X

Rui=1

↵2ui + �n

X

Rui=0

↵2ui

1

A ,

with �p, �n regularization hyperparameters. S(1) is regularized by means of the row-normalization constraint. Solving the above maximization for every user, is equivalentto minimizing the deviation function

D⇣S, ˜R

⌘=

X

u2U

0

@max

↵u⇤

0

@X

Rui=1

X

Ruj=0

↵ui↵uj(Suj � Sui) � �p

X

Rui=1

↵2ui � �n

X

Rui=0

↵2ui

1

A

1

A .

(29)Notice that this approach corresponds to te AMAN assumption.

4.3. Group 3: Three Factor Matrices, One Factor Matrix A Priori UnknownThere are also algorithms which model S with 3 factor matrices:

S = S(1)S(2)S(3).

To the best of our knowledge, they all follow the special case

S = RS(2)S(3).

In this case, the users are represented by |I|-dimensional binary vectors, the items arerepresented by f -dimensional real vectors and the similarity between two items i andj is computed by the inner product S(2)

i⇤ S(3)⇤j , which means that S(2)S(3) represents the

item-similarity matrix.Weston et al. [Weston et al. 2013] adopt a version of this model with a symmetric

item-similarity matrix, which is imposed by setting S(3)= S(2)T .

On the one hand, the deviation functions in Equation 21 and 4.1.4 try to minimizethe mean rank of the known preferences. On the other hand, the deviation functionin Equation 22 tries to push one known preference as high as possible to the top ofthe item-ranking (Eq. 22). Weston et al. [Weston et al. 2013] propose to minimize atrade-off between the above two extremes:

X

u2U

X

Rui=1

w

✓r>(Sui | {Sui | Rui = 1})

|u|

◆ X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

, (30)

with w() a function that weights the importance of the known preference as a functionof its predicted rank among all known preferences. This weighting function is user-defined and determines the trade-off between the two extremes, i.e. minimizing themean rank of the known preferences and minimizing the maximal rank of the knownpreferences. Because this function is non-differentiable, Weston et al. propose the dif-

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:14 K. Verstrepen et al.

pairs that are difficult to rank correctly. Therefore, Aiolli proposes to replace the uni-form weighting with a weighting scheme that minimizes the total margin. Specifically,he propose to solve for every user u, the joint optimization problem

S(1)u⇤ = arg max

S(1)u

min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj),

where for every user u, it holds thatP

Rui=1 ↵ui = 1 andP

Rui=0 ↵ui = 1. To avoidoverfitting of ↵, he adds two regularization terms:

S(1)u⇤ = arg max

S(1)u

min

↵u⇤

0

@X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj) + �p

X

Rui=1

↵2ui + �n

X

Rui=0

↵2ui

1

A ,

with �p, �n regularization hyperparameters. S(1) is regularized by means of the row-normalization constraint. Solving the above maximization for every user, is equivalentto minimizing the deviation function

D⇣S, ˜R

⌘=

X

u2U

0

@max

↵u⇤

0

@X

Rui=1

X

Ruj=0

↵ui↵uj(Suj � Sui) � �p

X

Rui=1

↵2ui � �n

X

Rui=0

↵2ui

1

A

1

A .

(29)Notice that this approach corresponds to te AMAN assumption.

4.3. Group 3: Three Factor Matrices, One Factor Matrix A Priori UnknownThere are also algorithms which model S with 3 factor matrices:

S = S(1)S(2)S(3).

To the best of our knowledge, they all follow the special case

S = RS(2)S(3).

In this case, the users are represented by |I|-dimensional binary vectors, the items arerepresented by f -dimensional real vectors and the similarity between two items i andj is computed by the inner product S(2)

i⇤ S(3)⇤j , which means that S(2)S(3) represents the

item-similarity matrix.Weston et al. [Weston et al. 2013] adopt a version of this model with a symmetric

item-similarity matrix, which is imposed by setting S(3)= S(2)T .

On the one hand, the deviation functions in Equation 21 and 4.1.4 try to minimizethe mean rank of the known preferences. On the other hand, the deviation functionin Equation 22 tries to push one known preference as high as possible to the top ofthe item-ranking (Eq. 22). Weston et al. [Weston et al. 2013] propose to minimize atrade-off between the above two extremes:

X

u2U

X

Rui=1

w

✓r>(Sui | {Sui | Rui = 1})

|u|

◆ X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

, (30)

with w() a function that weights the importance of the known preference as a functionof its predicted rank among all known preferences. This weighting function is user-defined and determines the trade-off between the two extremes, i.e. minimizing themean rank of the known preferences and minimizing the maximal rank of the knownpreferences. Because this function is non-differentiable, Weston et al. propose the dif-

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:14 K. Verstrepen et al.

pairs that are difficult to rank correctly. Therefore, Aiolli proposes to replace the uni-form weighting with a weighting scheme that minimizes the total margin. Specifically,he propose to solve for every user u, the joint optimization problem

S(1)u⇤ = arg max

S(1)u

min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj),

where for every user u, it holds thatP

Rui=1 ↵ui = 1 andP

Rui=0 ↵ui = 1. To avoidoverfitting of ↵, he adds two regularization terms:

S(1)u⇤ = arg max

S(1)u

min

↵u⇤

0

@X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj) + �p

X

Rui=1

↵2ui + �n

X

Rui=0

↵2ui

1

A ,

with �p, �n regularization hyperparameters. S(1) is regularized by means of the row-normalization constraint. Solving the above maximization for every user, is equivalentto minimizing the deviation function

D⇣S, ˜R

⌘=

X

u2U

0

@max

↵u⇤

0

@X

Rui=1

X

Ruj=0

↵ui↵uj(Suj � Sui) � �p

X

Rui=1

↵2ui � �n

X

Rui=0

↵2ui

1

A

1

A .

(29)Notice that this approach corresponds to te AMAN assumption.

4.3. Group 3: Three Factor Matrices, One Factor Matrix A Priori UnknownThere are also algorithms which model S with 3 factor matrices:

S = S(1)S(2)S(3).

To the best of our knowledge, they all follow the special case

S = RS(2)S(3).

In this case, the users are represented by |I|-dimensional binary vectors, the items arerepresented by f -dimensional real vectors and the similarity between two items i andj is computed by the inner product S(2)

i⇤ S(3)⇤j , which means that S(2)S(3) represents the

item-similarity matrix.Weston et al. [Weston et al. 2013] adopt a version of this model with a symmetric

item-similarity matrix, which is imposed by setting S(3)= S(2)T .

On the one hand, the deviation functions in Equation 21 and 4.1.4 try to minimizethe mean rank of the known preferences. On the other hand, the deviation functionin Equation 22 tries to push one known preference as high as possible to the top ofthe item-ranking (Eq. 22). Weston et al. [Weston et al. 2013] propose to minimize atrade-off between the above two extremes:

X

u2U

X

Rui=1

w

✓r>(Sui | {Sui | Rui = 1})

|u|

◆ X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

, (30)

with w() a function that weights the importance of the known preference as a functionof its predicted rank among all known preferences. This weighting function is user-defined and determines the trade-off between the two extremes, i.e. minimizing themean rank of the known preferences and minimizing the maximal rank of the knownpreferences. Because this function is non-differentiable, Weston et al. propose the dif-

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1   2   …   k   …   K  

0   0   0   1   0   0  

1   2   …   N  

false   false   false   true  

[Weston et al. 2013]

Page 99: Tutorial bpocf

KL-divergence approximation of posterior pdf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. InProceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.

Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-book. Springer, 145–186.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommendersystems. Computer 8 (2009), 30–37.

Weiyang Lin, Sergio A Alvarez, and Carolina Ruiz. 2002. Efficient adaptive-support association rule miningfor recommender systems. Data mining and knowledge discovery 6, 1 (2002), 83–105.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Approximation of

[Koeningstein et al. 2012] [Paquet and Koeningstein 2013]

Page 100: Tutorial bpocf

Local Minima converge to local minimum

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 101: Tutorial bpocf

Convex unique minimum

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Convex Optimization Algorithm

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 102: Tutorial bpocf

Max-Min-Margin AUC as average margin

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Aiolli 2014]

Page 103: Tutorial bpocf

Max-Min-Margin AUC as average margin

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Aiolli 2014]

Page 104: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:13

4.2.1. Reconstruction Based Deviation Functions

— Bell and Koren ord-rec? are is it only for ratings? check.Ning and Karypis [Ning and Karypis 2011] choose S(1)

= R and propose a standardreconstruction based deviation function:

D⇣S, ˜R

⌘=

X

u2U

X

i2I(Rui � Sui)

2+ �F ||S(2)||2F + �1||S(2)||1, (28)

with ||.||1 the entry-wise l1-norm, ||.||F the Frobenius norm and �1, �F the correspond-ing regularization constants which are hyperparameters of the method. The role of thel1-norm is to introduce sparsity. The role of the Frobenius norm is to prevent overfit-ting. Their combined use is called elastic net regularization, which is known to implic-itly group correlated items. Furthermore, Ning and Karypis impose the constraints

⇢S(2) � 0

diag(S(2)) = 0.

The first constraint expresses non-negativity of the item-similarities. The second con-straint is to avoid trivial solutions to the minimization of the deviation function inwhich every item would recommend itself. Notice that S(2)

⇤i is not required to be sym-metric. The sparsity induced by the l1-norm regularization lowers the the memory re-quired for S(2) and speeds-up the dotproduct computation Sui = Ru⇤ · S(2)

⇤i . Futher per-formance can be achieved by applying feature selection techniques. Ning and Karypisselected features by imposing S(2)

ij = 0 if i is not in the top 100 most similar items to j

as measured by cos(i, j). Their experiments show that this way of working significantlyreduced runtimes, while only slightly reducing complexity.

4.2.2. Ranking Based Deviation Functions. Also when S(1)= R, it is possible to use

ranking-based deviation functions. Rendle et al. propose to use exactly the same devi-ation function as in Equation 21 to optimize the AUC [Rendle et al. 2009]. The onlydifference is that for computing S, RS(2) is used instead of S(1)S(2), i.e. only the secondfactor matrix is unknown. Because S(2) can be interpreted as a item-similarity matrix,they call this method BPR-kNN.

Aiolli [Aiolli 2014] on the other hand, chooses the user-based alternative with S(2)=

¯R, with ¯R the column normalized version of R and S(1) row normalized. Consequently,it holds that �1 Sui � 1 since Sui = S(1)

u⇤ ¯R⇤i with ||S(1)u⇤ || 1 and || ¯R⇤i|| 1. For every

individual user u 2 U , he starts from AUCu, the AUC for u:

AUCu =

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj).

Next, he proposes a lower bound on AUCu:

AUCu � 1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

,

and interprets it as a weighted sum of margins Sui�Suj

2 between any known prefer-ences and any absent feedback, in which every margin gets the same weight 1

|u|·(|I|�|u|) .Hence maximizing this lower bound on the AUC corresponds to maximizing the sumof margins between any known preference and any absent feedback in which everymargin has the same weight. A problem with maximizing this sum is that very highmargins on pairs that are easily ranked correctly, can hide poor (negative) margins on

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Max-Min-Margin AUC as average margin

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Aiolli 2014]

Page 105: Tutorial bpocf

Max-Min-Margin AUC as average margin

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:9Fast Maximum Margin Matrix Factorization

The thresholds �r can be learned from the data. Further-more, a different set of thresholds can be learned for eachuser, allowing users to “use ratings differently” and allevi-ates the need to normalize the data. The problem can thenbe written as:

minimize �X�⌃ + CX

ij2S

R�1X

r=1

h(T rij(�ir � Xij)) (4)

where the variables optimized over are the matrix X andthe thresholds �. In other work, we find that such a for-mulation is highly effective for rating prediction (Rennie &Srebro, 2005).

Although the problem was formulated here as a single op-timization problem with a combined objective, �X�⌃ +

C · error, it should really be viewed as a dual-objectiveproblem of balancing between low trace-norm and low er-ror. Considering the entire set of attainable (�X�⌃ , error)pairs, the true object of interest is the exterior “front” ofthis set, i.e. the set of matricesX for which it is not possi-ble to reduce one of the two objectives without increasingthe other. This “front” can be found by varying the valueof C from zero (hard-margin) to infinity (no norm regular-ization).

All optimization problems discussed in this section can bewritten as semi-definite programs (Srebro et al., 2005).

3. Optimization MethodsWe describe here a local search heursitic for the problem(4). Instead of searching over X , we search over pairs ofmatrices (U, V ), as well as sets of thresholds �, and attemptto minimize the objective:

J(U, V, �).=

1

2

(�U�2Fro + �V �2

Fro)

+ C

R�1X

r=1

X

ij2S

h⇣T r

ij(�ir � UiV�j )

⌘. (5)

For any U, V we have �UV �⌃ 12 (�U�2

Fro + �V �2Fro) and

so J(U, V, �) upper bounds the minimization objective of(4), where X = UV �. Furthermore, for anyX , and in par-ticular theX minimizing (4), some factorizationX = UV �

achieves �X�⌃ =

12 (�U�2

Fro + �V �2Fro). The minimization

problem (4) is therefore equivalent to:

minimize J(U, V, �). (6)

The advantage of considering (6) instead of (4) is that�X�⌃ is a complicated non-differentiable function forwhich it is not easy to find the subdifrential. Finding gooddescent directions for (4) is not easy. On the other hand, the

0

0.5

1

1.5

2

-0.5 0 0.5 1 1.5

Loss

z

HingeSmooth Hinge

-1.5

-1

-0.5

0

0.5

-0.5 0 0.5 1 1.5

Der

ivat

ive

of L

oss

z

HingeSmooth Hinge

Figure 1. Shown are the loss function values (left) and gradients(right) for the Hinge and Smooth Hinge. Note that the gradientsare identical outside the region z � (0, 1).

objective J(U, V, �) is fairly simple. Ignoring for the mo-ment the non-differentiability of h(z) = (1 � z)+ at one,the gradient of J(U, V, �) is easy to compute. The partialderivative with respect to each element of U is:

@J

@Uia= Uia � C

R�1X

r=1

X

j|ij2S

Tij(k)h�⇣T r

ij(�ir � UiV�j )

⌘Vja

(7)

The partial derivative with respect to Vja is analogous. Thepartial derivative with respect to �ik is

@J

@�ir= C

X

j|ij2S

T rijh

�⇣T r

ij(�ir � UiV�j )

⌘. (8)

With the gradient in-hand, we can turn to gradient descentmethods for localy optimizing J(U, V, �). The disadvan-tage of considering (6) instead of (4) is that although theminimization objective in (4) is a convex function of X, �,the objective J(U, V, �) is not a convex function of U, V .This is potentially bothersome, and might inhibit conver-gence to the global minimum.

3.1. Smooth Hinge

In the previous discussion, we ignored the non-differentiability of the Hinge loss function h(z) at z = 1.In order to give us a smooth optimization surface, we usean alternative to the Hinge loss, which we refer to as theSmooth Hinge. Figure 1 shows the Hinge and SmoothHinge loss functions. The Smooth Hinge shares manyproperties with the Hinge, but is much easier to optimizedirectly via gradient descent methods. Like the Hinge, theSmooth Hinge is not sensitive to outliers, and does notcontinuously “reward” the model for increasing the outputvalue for an example. This contrasts with other smooth lossfunctions, such as the truncated quadratic (which is sensi-tive to outliers) and the Logistic (which “rewards” largeoutput values). We use the Smooth Hinge and the corre-sponding objective for our experiments in Section 4.

Fig. 3. Shown are the loss function values h(z) (left) and the gradients dh(z)/dz (right) for the Hinge andSmooth Hinge. Note that the gradients are identical outside the region z 2 (0, 1)) [Rennie and Srebro 2005].

and Scholz 2009] propose a bagging method for better scalability. Remark that bothmethods find different solutions to the minimization problem.

Besides the hinge loss, also the exponential and the binomial negative log-likelihoodloss functions exhibit a similar behavior [Bishop 2006]:

lexp( ˜Rui,Sui) = exp(� ˜RuiSui),

lll( ˜Rui,Sui) = log(1 + exp(�2

˜RuiSui)).

However, to the best of our knowledge they have not yet been used for one-class collab-orative filtering.

4.1.4. Ranking Based Deviation Functions. The scores computed by recommender sytemsare often used to personally rank all items for every user. Therefore, Rendle et al. [Ren-dle et al. 2009] argued that it is natural to directly optimize the ranking. More specifi-cally they aim to maximize the area under the ROC curve (AUC), which is given by:

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui>0

X

Ruj=0

�(Sui > Suj),

with �(true) = 1 and �(false) = 0. If the AUC is higher, the pairwise rankings inducedby the model S are more in line with the observed data R. However, because �(Sui >Suj) is non-differentiable, their deviation function is a differentiable approximationof the negative AUC from which constant factors have been removed and to which aregularization term has been added:

D⇣S, ˜R

⌘=

X

u2U

X

Rui>0

X

Ruj=0

log �(Suj � Sui) � �1||S(1)||2F � �2||S(2)||2F ,

with �(·) the sigmoid function and �1, �2 regularization constants, which are hyper-parameters of the method. Notice that this deviation function coniders all missingfeedback equally negative, i.e. it corresponds to the AMAN assumption.

However, very often, only the N highest ranked items are shown to users. Therefore,Shi et al. [Shi et al. 2012] propose to minimize the mean reciprocal rank (MRR) insteadof the AUC. The MRR is defined as

MRR =

1

|U|X

u2Ur>

✓max

Rui=1Sui | Su⇤

◆�1

,

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:13

4.2.1. Reconstruction Based Deviation Functions

— Bell and Koren ord-rec? are is it only for ratings? check.Ning and Karypis [Ning and Karypis 2011] choose S(1)

= R and propose a standardreconstruction based deviation function:

D⇣S, ˜R

⌘=

X

u2U

X

i2I(Rui � Sui)

2+ �F ||S(2)||2F + �1||S(2)||1, (28)

with ||.||1 the entry-wise l1-norm, ||.||F the Frobenius norm and �1, �F the correspond-ing regularization constants which are hyperparameters of the method. The role of thel1-norm is to introduce sparsity. The role of the Frobenius norm is to prevent overfit-ting. Their combined use is called elastic net regularization, which is known to implic-itly group correlated items. Furthermore, Ning and Karypis impose the constraints

⇢S(2) � 0

diag(S(2)) = 0.

The first constraint expresses non-negativity of the item-similarities. The second con-straint is to avoid trivial solutions to the minimization of the deviation function inwhich every item would recommend itself. Notice that S(2)

⇤i is not required to be sym-metric. The sparsity induced by the l1-norm regularization lowers the the memory re-quired for S(2) and speeds-up the dotproduct computation Sui = Ru⇤ · S(2)

⇤i . Futher per-formance can be achieved by applying feature selection techniques. Ning and Karypisselected features by imposing S(2)

ij = 0 if i is not in the top 100 most similar items to j

as measured by cos(i, j). Their experiments show that this way of working significantlyreduced runtimes, while only slightly reducing complexity.

4.2.2. Ranking Based Deviation Functions. Also when S(1)= R, it is possible to use

ranking-based deviation functions. Rendle et al. propose to use exactly the same devi-ation function as in Equation 21 to optimize the AUC [Rendle et al. 2009]. The onlydifference is that for computing S, RS(2) is used instead of S(1)S(2), i.e. only the secondfactor matrix is unknown. Because S(2) can be interpreted as a item-similarity matrix,they call this method BPR-kNN.

Aiolli [Aiolli 2014] on the other hand, chooses the user-based alternative with S(2)=

¯R, with ¯R the column normalized version of R and S(1) row normalized. Consequently,it holds that �1 Sui � 1 since Sui = S(1)

u⇤ ¯R⇤i with ||S(1)u⇤ || 1 and || ¯R⇤i|| 1. For every

individual user u 2 U , he starts from AUCu, the AUC for u:

AUCu =

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj).

Next, he proposes a lower bound on AUCu:

AUCu � 1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

,

and interprets it as a weighted sum of margins Sui�Suj

2 between any known prefer-ences and any absent feedback, in which every margin gets the same weight 1

|u|·(|I|�|u|) .Hence maximizing this lower bound on the AUC corresponds to maximizing the sumof margins between any known preference and any absent feedback in which everymargin has the same weight. A problem with maximizing this sum is that very highmargins on pairs that are easily ranked correctly, can hide poor (negative) margins on

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Aiolli 2014]

Page 106: Tutorial bpocf

Max-Min-Margin average à min total

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Aiolli 2014]

Page 107: Tutorial bpocf

Max-Min-Margin average à min total

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Aiolli 2014]

Page 108: Tutorial bpocf

Max-Min-Margin add regularization

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:33

X

u2U

X

i2IRuiWui (1 � Sui)

2

+

X

u2U

X

i2I(1 � Rui)Wui

⇣Pui (1 � Sui)

2+ (1 � Pui) (0 � Sui)

2⌘

+ �||S(1)||F + �||S(2)||F� ↵

X

u2U

X

i2I(1 � Rui)H (Pui) (57)

(1 � 0)

2= 1 = (1 � 2)

2 (58)

w(j) =

X

u2URuj

X

u2U

X

Rui=1

X

Ruj=0

((Rui � Ruj) � (Sui � Suj))2

+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F

X

u2U

X

i2I(Rui � Sui)

2+

TX

t=1

FX

f=1

�tf ||S(t,f)||2F + ||S(t,f)||1

max

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

Sui � Suj

2

max min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:14 K. Verstrepen et al.

pairs that are difficult to rank correctly. Therefore, Aiolli proposes to replace the uni-form weighting with a weighting scheme that minimizes the total margin. Specifically,he propose to solve for every user u, the joint optimization problem

S(1)u⇤ = arg max

S(1)u

min

↵u⇤

X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj),

where for every user u, it holds thatP

Rui=1 ↵ui = 1 andP

Rui=0 ↵ui = 1. To avoidoverfitting of ↵, he adds two regularization terms:

S(1)u⇤ = arg max

S(1)u

min

↵u⇤

0

@X

Rui=1

X

Ruj=0

↵ui↵uj(Sui � Suj) + �p

X

Rui=1

↵2ui + �n

X

Rui=0

↵2ui

1

A ,

with �p, �n regularization hyperparameters. S(1) is regularized by means of the row-normalization constraint. Solving the above maximization for every user, is equivalentto minimizing the deviation function

D⇣S, ˜R

⌘=

X

u2U

0

@max

↵u⇤

0

@X

Rui=1

X

Ruj=0

↵ui↵uj(Suj � Sui) � �p

X

Rui=1

↵2ui � �n

X

Rui=0

↵2ui

1

A

1

A .

(29)Notice that this approach corresponds to te AMAN assumption.

4.3. Group 3: Three Factor Matrices, One Factor Matrix A Priori UnknownThere are also algorithms which model S with 3 factor matrices:

S = S(1)S(2)S(3).

To the best of our knowledge, they all follow the special case

S = RS(2)S(3).

In this case, the users are represented by |I|-dimensional binary vectors, the items arerepresented by f -dimensional real vectors and the similarity between two items i andj is computed by the inner product S(2)

i⇤ S(3)⇤j , which means that S(2)S(3) represents the

item-similarity matrix.Weston et al. [Weston et al. 2013] adopt a version of this model with a symmetric

item-similarity matrix, which is imposed by setting S(3)= S(2)T .

On the one hand, the deviation functions in Equation 21 and 4.1.4 try to minimizethe mean rank of the known preferences. On the other hand, the deviation functionin Equation 22 tries to push one known preference as high as possible to the top ofthe item-ranking (Eq. 22). Weston et al. [Weston et al. 2013] propose to minimize atrade-off between the above two extremes:

X

u2U

X

Rui=1

w

✓r>(Sui | {Sui | Rui = 1})

|u|

◆ X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

, (30)

with w() a function that weights the importance of the known preference as a functionof its predicted rank among all known preferences. This weighting function is user-defined and determines the trade-off between the two extremes, i.e. minimizing themean rank of the known preferences and minimizing the maximal rank of the knownpreferences. Because this function is non-differentiable, Weston et al. propose the dif-

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Aiolli 2014]

Page 109: Tutorial bpocf

Convex unique minimum

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Analytically computable

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 110: Tutorial bpocf

Nearest Neighbors user- or item-similarity

[Aiolli 2013] [Deshpande and Karypis 2004] [Sigurbjörnsson and Van Zwol 2008]

[Sarwar et al. 2001] [Mobasher et al. 2001] [Lin et al. 2002]

[Sarwar et al. 2000] [Menezes et al. 2010] [van Leeuwen and Puspitaningrum 2012]

Page 111: Tutorial bpocf

Nearest Neighbors similarity measures

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(v, u) · |KNN (v) \ {u}| � S(2)

vu

⌘2

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.H. Ma. 2013. An Experimental Study on Implicit Social Recommendation. In SIGIR. 73–82.G.V. Menezes, J.M. Almeida, F.B. Belem, M.A. Goncalves, A. Lacerda, E. Silva de Moura, G.L. Pappa, A.

Veloso, and N. Ziviani. 2010. Demand Drive Tag Recommendation. In ECML/PKDD. 402–417.B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. 2001. Effective Personalization Based on Association Rule

Discovery from Web Usage Data. In WIDM. 9–15.X. Ning and G. Karypis. 2011. Slim: Sparse Linear Methods for Top-N recommender systems. In ICDM.

497–506.R. Pan and M. Scholz. 2009. Mind the Gaps: Weighting the Unknown in Large-scale One-class Collaborative

Filtering. In KDD. 667–676.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Aiolli 2013] [Deshpande and Karypis 2004] [Sigurbjörnsson and Van Zwol 2008]

[Sarwar et al. 2001] [Mobasher et al. 2001] [Lin et al. 2002]

[Sarwar et al. 2000] [Menezes et al. 2010] [van Leeuwen and Puspitaningrum 2012]

1:34 K. Verstrepen et al.

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(60)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(1)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3)

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(|R|)

O(|R| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

=

rD(S,R) = rX

u2U

X

i2IRui=1

Dui(S,R) =

X

u2U

X

i2IRui=1

rDui(S,R)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 112: Tutorial bpocf

Nearest Neighbors similarity measures

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(v, u) · |KNN (v) \ {u}| � S(2)

vu

⌘2

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.H. Ma. 2013. An Experimental Study on Implicit Social Recommendation. In SIGIR. 73–82.G.V. Menezes, J.M. Almeida, F.B. Belem, M.A. Goncalves, A. Lacerda, E. Silva de Moura, G.L. Pappa, A.

Veloso, and N. Ziviani. 2010. Demand Drive Tag Recommendation. In ECML/PKDD. 402–417.B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. 2001. Effective Personalization Based on Association Rule

Discovery from Web Usage Data. In WIDM. 9–15.X. Ning and G. Karypis. 2011. Slim: Sparse Linear Methods for Top-N recommender systems. In ICDM.

497–506.R. Pan and M. Scholz. 2009. Mind the Gaps: Weighting the Unknown in Large-scale One-class Collaborative

Filtering. In KDD. 667–676.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(v, u) · |KNN (v) \ {u}| � S(2)

vu

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.H. Ma. 2013. An Experimental Study on Implicit Social Recommendation. In SIGIR. 73–82.G.V. Menezes, J.M. Almeida, F.B. Belem, M.A. Goncalves, A. Lacerda, E. Silva de Moura, G.L. Pappa, A.

Veloso, and N. Ziviani. 2010. Demand Drive Tag Recommendation. In ECML/PKDD. 402–417.B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. 2001. Effective Personalization Based on Association Rule

Discovery from Web Usage Data. In WIDM. 9–15.X. Ning and G. Karypis. 2011. Slim: Sparse Linear Methods for Top-N recommender systems. In ICDM.

497–506.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.H. Ma. 2013. An Experimental Study on Implicit Social Recommendation. In SIGIR. 73–82.G.V. Menezes, J.M. Almeida, F.B. Belem, M.A. Goncalves, A. Lacerda, E. Silva de Moura, G.L. Pappa, A.

Veloso, and N. Ziviani. 2010. Demand Drive Tag Recommendation. In ECML/PKDD. 402–417.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Aiolli 2013] [Deshpande and Karypis 2004] [Sigurbjörnsson and Van Zwol 2008]

[Sarwar et al. 2001] [Mobasher et al. 2001] [Lin et al. 2002]

[Sarwar et al. 2000] [Menezes et al. 2010] [van Leeuwen and Puspitaningrum 2012]

1:34 K. Verstrepen et al.

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(60)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(1)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3)

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(|R|)

O(|R| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

=

rD(S,R) = rX

u2U

X

i2IRui=1

Dui(S,R) =

X

u2U

X

i2IRui=1

rDui(S,R)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 113: Tutorial bpocf

Nearest Neighbors unified

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(v, u) · |KNN (v) \ {u}| � S(2)

vu

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.H. Ma. 2013. An Experimental Study on Implicit Social Recommendation. In SIGIR. 73–82.G.V. Menezes, J.M. Almeida, F.B. Belem, M.A. Goncalves, A. Lacerda, E. Silva de Moura, G.L. Pappa, A.

Veloso, and N. Ziviani. 2010. Demand Drive Tag Recommendation. In ECML/PKDD. 402–417.B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. 2001. Effective Personalization Based on Association Rule

Discovery from Web Usage Data. In WIDM. 9–15.X. Ning and G. Karypis. 2011. Slim: Sparse Linear Methods for Top-N recommender systems. In ICDM.

497–506.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.N. Koeningstein, N. Nice, U. Paquet, and N. Schleyen. 2012. The Xbox Recommender System. In RecSys.

281–284.Y. Koren and R. Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook,

F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.). Springer, Boston, MA.Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender

systems. Computer 8 (2009), 30–37.W. Lin, S.a. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender sys-

tems. Data Min. Knowl. Discov. 6, 1 (????), 83–105.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.S. Kabbur, X. Ning, and G. Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender

Systems. In KDD. 659–667.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Verstrepen and Goethals 2014]

1:34 K. Verstrepen et al.

X

u2U

X

Rui=1

X

Ruj=0

�(Suj + 1 � Sui)

r>(Suj | {Suk | Ruk = 0})

(60)

AUC =

1

|U|X

u2U

1

|u| · (|I| � |u|)X

Rui=1

X

Ruj=0

�(Sui > Suj),

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(1)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3)

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(|R|)

O(|R| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

=

rD(S,R) = rX

u2U

X

i2IRui=1

Dui(S,R) =

X

u2U

X

i2IRui=1

rDui(S,R)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 114: Tutorial bpocf

Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference

•  Netflix

Page 115: Tutorial bpocf

Netflix Prize rating data

Page 116: Tutorial bpocf

n-star rating scale n=5

Page 117: Tutorial bpocf

n-star rating scale n=10

Page 118: Tutorial bpocf

n-star rating scale n=1

Page 119: Tutorial bpocf

No negative feedback

?

Page 120: Tutorial bpocf

Pearsson Correlation not applicable

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:25

4.7. DiscussionA comment on model complexity, the tricks it requires to solve (regulariza-tion,smoothing), the quality of the solution (an arbitrary local minimum) and the close-ness to the intuitive optimization objective.

5. USABILITY OF RATING BASED ALGORITHMSInterest in collaborative filtering on binary, positive-only data only recently increased.The majority of the existing collaborative filtering research assumes rating data. Inthis case, the feedback of user u about item i, i.e. Rui, is an integer between Bl and Bh,with Bl and Bh the most negative and postive feedback respectively. The most typicalexample of rating data was provided in the context of the Netflix Price with Bl = 1 andBh = 5.

Technically, our case of binary, positive-only data is just a special case of rating datawith Bl = Bh = 1. However, collaborative filtering algorithms for rating data are ingeneral build on the implicit assumption that Bl < Bh, i.e. that both positive and nega-tive feedback is available. Since this negative feedback is not available in our problemsetting, it is not surprising that, in general, algorithms for rating data generate pooror even nonsensical results [Hu et al. 2008; Pan et al. 2008].

k-NN algorithms for rating data, for example, often use the Pearson correlation co-efficient as a similarity measure. The Pearson correlation coefficient between users uand v is given by

pcc(u, v) =

PRuj ,Rvj>0

(Ruj � Ru)(Rvj � Rv)

r PRuj ,Rvj>0

(Ruj � Ru)

2r P

Ruj ,Rvj>0(Rvj � Rv)

2,

with Ru and Rv the average rating of u and v respectively. In our setting, with binary,positive-only data however, Ruj and Rvj are by definition always one. Consequently,also Ru and Rv are always one. Therefore, the Pearson correlation is always zero orundefined (zero divided by zero), making it a useless similarity measure for binary,positive-only data. Even if we would hack it by omitting the terms for mean centering,�Ru and �Rv, it is still useless since it would always be equal to either one or zero.

Furthermore, when computing the score of user u for item i, user(item)-based k-NNalgorithms for rating data typically find the k users (items) that are most similar to u(i) and that have rated i (have been rated by u) [Desrosiers and Karypis 2011; Jannachet al. 2011]. On bpo data, this approach results in the nonsensical result that Sui = 1

for every (u, i)-pair.Also the matrix factorization methods for rating data are in general not applicable

to bpo data. Take for example a basic loss function for matrix factorization on ratingdata:

min

S(1),S(2)

X

Rui>0

⇣Rui � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘,

which for bpo data simplifies to

min

S(1),S(2)

X

Rui>0

⇣1 � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘.

The squared error term of this loss function is minimized when the rows and columnsof S(1) and S(2) respectively are all the same unit vector. This is obviously a nonsensicalsolution.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 121: Tutorial bpocf

Pearsson Correlation not applicable

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:25

4.7. DiscussionA comment on model complexity, the tricks it requires to solve (regulariza-tion,smoothing), the quality of the solution (an arbitrary local minimum) and the close-ness to the intuitive optimization objective.

5. USABILITY OF RATING BASED ALGORITHMSInterest in collaborative filtering on binary, positive-only data only recently increased.The majority of the existing collaborative filtering research assumes rating data. Inthis case, the feedback of user u about item i, i.e. Rui, is an integer between Bl and Bh,with Bl and Bh the most negative and postive feedback respectively. The most typicalexample of rating data was provided in the context of the Netflix Price with Bl = 1 andBh = 5.

Technically, our case of binary, positive-only data is just a special case of rating datawith Bl = Bh = 1. However, collaborative filtering algorithms for rating data are ingeneral build on the implicit assumption that Bl < Bh, i.e. that both positive and nega-tive feedback is available. Since this negative feedback is not available in our problemsetting, it is not surprising that, in general, algorithms for rating data generate pooror even nonsensical results [Hu et al. 2008; Pan et al. 2008].

k-NN algorithms for rating data, for example, often use the Pearson correlation co-efficient as a similarity measure. The Pearson correlation coefficient between users uand v is given by

pcc(u, v) =

PRuj ,Rvj>0

(Ruj � Ru)(Rvj � Rv)

r PRuj ,Rvj>0

(Ruj � Ru)

2r P

Ruj ,Rvj>0(Rvj � Rv)

2,

with Ru and Rv the average rating of u and v respectively. In our setting, with binary,positive-only data however, Ruj and Rvj are by definition always one. Consequently,also Ru and Rv are always one. Therefore, the Pearson correlation is always zero orundefined (zero divided by zero), making it a useless similarity measure for binary,positive-only data. Even if we would hack it by omitting the terms for mean centering,�Ru and �Rv, it is still useless since it would always be equal to either one or zero.

Furthermore, when computing the score of user u for item i, user(item)-based k-NNalgorithms for rating data typically find the k users (items) that are most similar to u(i) and that have rated i (have been rated by u) [Desrosiers and Karypis 2011; Jannachet al. 2011]. On bpo data, this approach results in the nonsensical result that Sui = 1

for every (u, i)-pair.Also the matrix factorization methods for rating data are in general not applicable

to bpo data. Take for example a basic loss function for matrix factorization on ratingdata:

min

S(1),S(2)

X

Rui>0

⇣Rui � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘,

which for bpo data simplifies to

min

S(1),S(2)

X

Rui>0

⇣1 � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘.

The squared error term of this loss function is minimized when the rows and columnsof S(1) and S(2) respectively are all the same unit vector. This is obviously a nonsensicalsolution.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1   1  

1   1  

Page 122: Tutorial bpocf

Pearsson Correlation not applicable

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:25

4.7. DiscussionA comment on model complexity, the tricks it requires to solve (regulariza-tion,smoothing), the quality of the solution (an arbitrary local minimum) and the close-ness to the intuitive optimization objective.

5. USABILITY OF RATING BASED ALGORITHMSInterest in collaborative filtering on binary, positive-only data only recently increased.The majority of the existing collaborative filtering research assumes rating data. Inthis case, the feedback of user u about item i, i.e. Rui, is an integer between Bl and Bh,with Bl and Bh the most negative and postive feedback respectively. The most typicalexample of rating data was provided in the context of the Netflix Price with Bl = 1 andBh = 5.

Technically, our case of binary, positive-only data is just a special case of rating datawith Bl = Bh = 1. However, collaborative filtering algorithms for rating data are ingeneral build on the implicit assumption that Bl < Bh, i.e. that both positive and nega-tive feedback is available. Since this negative feedback is not available in our problemsetting, it is not surprising that, in general, algorithms for rating data generate pooror even nonsensical results [Hu et al. 2008; Pan et al. 2008].

k-NN algorithms for rating data, for example, often use the Pearson correlation co-efficient as a similarity measure. The Pearson correlation coefficient between users uand v is given by

pcc(u, v) =

PRuj ,Rvj>0

(Ruj � Ru)(Rvj � Rv)

r PRuj ,Rvj>0

(Ruj � Ru)

2r P

Ruj ,Rvj>0(Rvj � Rv)

2,

with Ru and Rv the average rating of u and v respectively. In our setting, with binary,positive-only data however, Ruj and Rvj are by definition always one. Consequently,also Ru and Rv are always one. Therefore, the Pearson correlation is always zero orundefined (zero divided by zero), making it a useless similarity measure for binary,positive-only data. Even if we would hack it by omitting the terms for mean centering,�Ru and �Rv, it is still useless since it would always be equal to either one or zero.

Furthermore, when computing the score of user u for item i, user(item)-based k-NNalgorithms for rating data typically find the k users (items) that are most similar to u(i) and that have rated i (have been rated by u) [Desrosiers and Karypis 2011; Jannachet al. 2011]. On bpo data, this approach results in the nonsensical result that Sui = 1

for every (u, i)-pair.Also the matrix factorization methods for rating data are in general not applicable

to bpo data. Take for example a basic loss function for matrix factorization on ratingdata:

min

S(1),S(2)

X

Rui>0

⇣Rui � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘,

which for bpo data simplifies to

min

S(1),S(2)

X

Rui>0

⇣1 � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘.

The squared error term of this loss function is minimized when the rows and columnsof S(1) and S(2) respectively are all the same unit vector. This is obviously a nonsensicalsolution.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1   1  

1   1  

1   1  

1   1  

Page 123: Tutorial bpocf

Different Neighborhood trivial solutions

1:30 K. Verstrepen et al.

— Convince the reader ranking is more important than RMSE or MSE.— data splits (leave-one-out, 5 fold, ...)— Pradel et al. :ranking with non-random missing ratings: influence of popularity and

positivity on evaluation metrics— Marlin et al. :Collaaborative prediction and ranking with non-random missing data— Marlin et al. :collaborative filtering and the missing at random assumption— Steck: Training and testing of recommender systems on data missing not at random— We should emphasise how choosing hyperparameters is often done in a way that

causes leakage.

7.2. online— Who: Kanishka?— Convince the reader this is much better than offline, how to do it etc.

8. EXPERIMENTAL EVALUATION— Who: ?— THE offline comparison of OCCF algorithms. Many datasets, many algorithms, many

evaluation measures, multiple data split methods, sufficiently randomized.— also empirically evaluate the explanations extracted.

9. SYMBOLS FOR PRESENTATIONU

IR

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

?

Page 124: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:25

4.7. DiscussionA comment on model complexity, the tricks it requires to solve (regulariza-tion,smoothing), the quality of the solution (an arbitrary local minimum) and the close-ness to the intuitive optimization objective.

5. USABILITY OF RATING BASED ALGORITHMSInterest in collaborative filtering on binary, positive-only data only recently increased.The majority of the existing collaborative filtering research assumes rating data. Inthis case, the feedback of user u about item i, i.e. Rui, is an integer between Bl and Bh,with Bl and Bh the most negative and postive feedback respectively. The most typicalexample of rating data was provided in the context of the Netflix Price with Bl = 1 andBh = 5.

Technically, our case of binary, positive-only data is just a special case of rating datawith Bl = Bh = 1. However, collaborative filtering algorithms for rating data are ingeneral build on the implicit assumption that Bl < Bh, i.e. that both positive and nega-tive feedback is available. Since this negative feedback is not available in our problemsetting, it is not surprising that, in general, algorithms for rating data generate pooror even nonsensical results [Hu et al. 2008; Pan et al. 2008].

k-NN algorithms for rating data, for example, often use the Pearson correlation co-efficient as a similarity measure. The Pearson correlation coefficient between users uand v is given by

pcc(u, v) =

PRuj ,Rvj>0

(Ruj � Ru)(Rvj � Rv)

r PRuj ,Rvj>0

(Ruj � Ru)

2r P

Ruj ,Rvj>0(Rvj � Rv)

2,

with Ru and Rv the average rating of u and v respectively. In our setting, with binary,positive-only data however, Ruj and Rvj are by definition always one. Consequently,also Ru and Rv are always one. Therefore, the Pearson correlation is always zero orundefined (zero divided by zero), making it a useless similarity measure for binary,positive-only data. Even if we would hack it by omitting the terms for mean centering,�Ru and �Rv, it is still useless since it would always be equal to either one or zero.

Furthermore, when computing the score of user u for item i, user(item)-based k-NNalgorithms for rating data typically find the k users (items) that are most similar to u(i) and that have rated i (have been rated by u) [Desrosiers and Karypis 2011; Jannachet al. 2011]. On bpo data, this approach results in the nonsensical result that Sui = 1

for every (u, i)-pair.Also the matrix factorization methods for rating data are in general not applicable

to bpo data. Take for example a basic loss function for matrix factorization on ratingdata:

min

S(1),S(2)

X

Rui>0

⇣Rui � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘,

which for bpo data simplifies to

min

S(1),S(2)

X

Rui>0

⇣1 � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘.

The squared error term of this loss function is minimized when the rows and columnsof S(1) and S(2) respectively are all the same unit vector. This is obviously a nonsensicalsolution.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Matrix Factorization # trivial solutions = inf

Page 125: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:25

4.7. DiscussionA comment on model complexity, the tricks it requires to solve (regulariza-tion,smoothing), the quality of the solution (an arbitrary local minimum) and the close-ness to the intuitive optimization objective.

5. USABILITY OF RATING BASED ALGORITHMSInterest in collaborative filtering on binary, positive-only data only recently increased.The majority of the existing collaborative filtering research assumes rating data. Inthis case, the feedback of user u about item i, i.e. Rui, is an integer between Bl and Bh,with Bl and Bh the most negative and postive feedback respectively. The most typicalexample of rating data was provided in the context of the Netflix Price with Bl = 1 andBh = 5.

Technically, our case of binary, positive-only data is just a special case of rating datawith Bl = Bh = 1. However, collaborative filtering algorithms for rating data are ingeneral build on the implicit assumption that Bl < Bh, i.e. that both positive and nega-tive feedback is available. Since this negative feedback is not available in our problemsetting, it is not surprising that, in general, algorithms for rating data generate pooror even nonsensical results [Hu et al. 2008; Pan et al. 2008].

k-NN algorithms for rating data, for example, often use the Pearson correlation co-efficient as a similarity measure. The Pearson correlation coefficient between users uand v is given by

pcc(u, v) =

PRuj ,Rvj>0

(Ruj � Ru)(Rvj � Rv)

r PRuj ,Rvj>0

(Ruj � Ru)

2r P

Ruj ,Rvj>0(Rvj � Rv)

2,

with Ru and Rv the average rating of u and v respectively. In our setting, with binary,positive-only data however, Ruj and Rvj are by definition always one. Consequently,also Ru and Rv are always one. Therefore, the Pearson correlation is always zero orundefined (zero divided by zero), making it a useless similarity measure for binary,positive-only data. Even if we would hack it by omitting the terms for mean centering,�Ru and �Rv, it is still useless since it would always be equal to either one or zero.

Furthermore, when computing the score of user u for item i, user(item)-based k-NNalgorithms for rating data typically find the k users (items) that are most similar to u(i) and that have rated i (have been rated by u) [Desrosiers and Karypis 2011; Jannachet al. 2011]. On bpo data, this approach results in the nonsensical result that Sui = 1

for every (u, i)-pair.Also the matrix factorization methods for rating data are in general not applicable

to bpo data. Take for example a basic loss function for matrix factorization on ratingdata:

min

S(1),S(2)

X

Rui>0

⇣Rui � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘,

which for bpo data simplifies to

min

S(1),S(2)

X

Rui>0

⇣1 � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘.

The squared error term of this loss function is minimized when the rows and columnsof S(1) and S(2) respectively are all the same unit vector. This is obviously a nonsensicalsolution.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Matrix Factorization # trivial solutions = inf

Page 126: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:25

4.7. DiscussionA comment on model complexity, the tricks it requires to solve (regulariza-tion,smoothing), the quality of the solution (an arbitrary local minimum) and the close-ness to the intuitive optimization objective.

5. USABILITY OF RATING BASED ALGORITHMSInterest in collaborative filtering on binary, positive-only data only recently increased.The majority of the existing collaborative filtering research assumes rating data. Inthis case, the feedback of user u about item i, i.e. Rui, is an integer between Bl and Bh,with Bl and Bh the most negative and postive feedback respectively. The most typicalexample of rating data was provided in the context of the Netflix Price with Bl = 1 andBh = 5.

Technically, our case of binary, positive-only data is just a special case of rating datawith Bl = Bh = 1. However, collaborative filtering algorithms for rating data are ingeneral build on the implicit assumption that Bl < Bh, i.e. that both positive and nega-tive feedback is available. Since this negative feedback is not available in our problemsetting, it is not surprising that, in general, algorithms for rating data generate pooror even nonsensical results [Hu et al. 2008; Pan et al. 2008].

k-NN algorithms for rating data, for example, often use the Pearson correlation co-efficient as a similarity measure. The Pearson correlation coefficient between users uand v is given by

pcc(u, v) =

PRuj ,Rvj>0

(Ruj � Ru)(Rvj � Rv)

r PRuj ,Rvj>0

(Ruj � Ru)

2r P

Ruj ,Rvj>0(Rvj � Rv)

2,

with Ru and Rv the average rating of u and v respectively. In our setting, with binary,positive-only data however, Ruj and Rvj are by definition always one. Consequently,also Ru and Rv are always one. Therefore, the Pearson correlation is always zero orundefined (zero divided by zero), making it a useless similarity measure for binary,positive-only data. Even if we would hack it by omitting the terms for mean centering,�Ru and �Rv, it is still useless since it would always be equal to either one or zero.

Furthermore, when computing the score of user u for item i, user(item)-based k-NNalgorithms for rating data typically find the k users (items) that are most similar to u(i) and that have rated i (have been rated by u) [Desrosiers and Karypis 2011; Jannachet al. 2011]. On bpo data, this approach results in the nonsensical result that Sui = 1

for every (u, i)-pair.Also the matrix factorization methods for rating data are in general not applicable

to bpo data. Take for example a basic loss function for matrix factorization on ratingdata:

min

S(1),S(2)

X

Rui>0

⇣Rui � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘,

which for bpo data simplifies to

min

S(1),S(2)

X

Rui>0

⇣1 � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘.

The squared error term of this loss function is minimized when the rows and columnsof S(1) and S(2) respectively are all the same unit vector. This is obviously a nonsensicalsolution.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Matrix Factorization # trivial solutions = inf

1  

Page 127: Tutorial bpocf

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:25

4.7. DiscussionA comment on model complexity, the tricks it requires to solve (regulariza-tion,smoothing), the quality of the solution (an arbitrary local minimum) and the close-ness to the intuitive optimization objective.

5. USABILITY OF RATING BASED ALGORITHMSInterest in collaborative filtering on binary, positive-only data only recently increased.The majority of the existing collaborative filtering research assumes rating data. Inthis case, the feedback of user u about item i, i.e. Rui, is an integer between Bl and Bh,with Bl and Bh the most negative and postive feedback respectively. The most typicalexample of rating data was provided in the context of the Netflix Price with Bl = 1 andBh = 5.

Technically, our case of binary, positive-only data is just a special case of rating datawith Bl = Bh = 1. However, collaborative filtering algorithms for rating data are ingeneral build on the implicit assumption that Bl < Bh, i.e. that both positive and nega-tive feedback is available. Since this negative feedback is not available in our problemsetting, it is not surprising that, in general, algorithms for rating data generate pooror even nonsensical results [Hu et al. 2008; Pan et al. 2008].

k-NN algorithms for rating data, for example, often use the Pearson correlation co-efficient as a similarity measure. The Pearson correlation coefficient between users uand v is given by

pcc(u, v) =

PRuj ,Rvj>0

(Ruj � Ru)(Rvj � Rv)

r PRuj ,Rvj>0

(Ruj � Ru)

2r P

Ruj ,Rvj>0(Rvj � Rv)

2,

with Ru and Rv the average rating of u and v respectively. In our setting, with binary,positive-only data however, Ruj and Rvj are by definition always one. Consequently,also Ru and Rv are always one. Therefore, the Pearson correlation is always zero orundefined (zero divided by zero), making it a useless similarity measure for binary,positive-only data. Even if we would hack it by omitting the terms for mean centering,�Ru and �Rv, it is still useless since it would always be equal to either one or zero.

Furthermore, when computing the score of user u for item i, user(item)-based k-NNalgorithms for rating data typically find the k users (items) that are most similar to u(i) and that have rated i (have been rated by u) [Desrosiers and Karypis 2011; Jannachet al. 2011]. On bpo data, this approach results in the nonsensical result that Sui = 1

for every (u, i)-pair.Also the matrix factorization methods for rating data are in general not applicable

to bpo data. Take for example a basic loss function for matrix factorization on ratingdata:

min

S(1),S(2)

X

Rui>0

⇣Rui � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘,

which for bpo data simplifies to

min

S(1),S(2)

X

Rui>0

⇣1 � S(1)

u· S(2)·i

⌘2+ �

⇣||S(1)

u· ||2F + ||S(2)·i ||2F

⌘.

The squared error term of this loss function is minimized when the rows and columnsof S(1) and S(2) respectively are all the same unit vector. This is obviously a nonsensicalsolution.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Matrix Factorization # trivial solutions = inf

1  

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.D. Jannach, M. Zanker, A. Felfernig, and G. Frierich. 2011. Recommender Systems: An Introduction. Cam-

bridge University Press, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 128: Tutorial bpocf

Agenda •  Introduction •  Algorithms – Elegant example – Models – Deviation functions – Difference with rating-based algorithms – Parameter inference

•  Netflix

Page 129: Tutorial bpocf

SGD mostly prohibitive

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:31

Sui = S(1)u⇤ · S(2)

⇤i

S = S(1)S(2)

S =

⇣S(1,1) · · ·S(1,F1)

⌘+ · · · +

⇣S(T,1) · · ·S(T,FT )

max

X

Rui=1

log p(i|u)

max

X

Rui=1

logSui

min �X

Rui=1

logSui

D (S,R) = �X

Rui=1

logSui

min D (S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

=

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

start finish

along the way

Page 130: Tutorial bpocf

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

=

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

=

rD(S,R) = rX

u2U

X

i2IRui=1

Dui(S,R) =

X

u2U

X

i2IRui=1

rDui(S,R)

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(|R|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

=

rD(S,R) = rX

u2U

X

i2IRui=1

Dui(S,R) =

X

u2U

X

i2IRui=1

rDui(S,R)

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(|R|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

=

rD(S,R) = rX

u2U

X

i2IRui=1

Dui(S,R) =

X

u2U

X

i2IRui=1

rDui(S,R)

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(|R|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

=

rD(S,R) = rX

u2U

X

i2IRui=1

Dui(S,R) =

X

u2U

X

i2IRui=1

rDui(S,R)

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(|R|)

O(|R| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

(S(1,1), . . . ,S(T,F ))

�⌘ · rD(S,R)

=

rD(S,R) = rX

u2U

X

i2IRui=1

Dui(S,R) =

X

u2U

X

i2IRui=1

rDui(S,R)

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

x1000  

SGD mostly prohibitive

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.F. Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, O. Mainmon

and L. Rokach (Eds.). Springer, New York, NY.Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM.

263–272.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

[Shi et al. 2012]

Page 131: Tutorial bpocf

ALS if possible

1:34 K. Verstrepen et al.

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(2)

uv

⌘2

S(2)ji = sim(j, i) · |KNN (j) \ {i}|

S(2)uv = sim(u, v) · |KNN (u) \ {v}|

S(3)uv = sim(u, v) · |KNN (u) \ {v}|

for all i, j 2 Ifor all u, v 2 U

X

i2I

X

j2I

⇣sim(j, i) · |KNN (j) \ {i}| � S(2)

ji

⌘2+

X

u2U

X

v2U

⇣sim(u, v) · |KNN (u) \ {v}| � S(3

uv

⌘2

every row S(1)u. and every column S(2)

.i the same unit vector

O(|U| ⇥ |I|)

O(d3(|U| + |I|) + d2|R|

REFERENCESF. Aiolli. 2013. Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets. In RecSys.

273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In RecSys.

293–296.S.S. Anand and B. Mobasher. 2006. Contextual Recommendation. In WebMine. 142–160.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on

top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems.39–46.

M. Deshpande and G. Karypis. 2004. Item-Based Top-N Recommendation Algorithms. TOIS 22, 1 (2004),143–177.

C. Desrosiers and G. Karypis. 2011. A Comprehensive Survey of Neighborhood-based RecommendationMethods. In Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor (Eds.).Springer, Boston, MA.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

E. Gaussier and C. Goutte. 2005. Relation between PLSA and NMF and implications. In SIGIR. 601–602.T. Hofmann. 1999. Probabilistic Latent Semantic Indexing. In SIGIR. 50–57.Thomas Hofmann. 2004. Latent Semantic Models for Collaborative Filtering. ACM Trans. Inf. Syst. 22, 1

(2004), 89–115.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:27

than the global minimum. A possible mitigation is to rerun to minimization procedurewith different initializations and choose the result that gives the best local minimum.

From a numerical optimization point of view, most reconstruction based algorithmsfor bpo data pose a bigger computational challenge than reconstruction based algo-rithms for rating data. The reason for this is that most deviation functions for ratingdata only sum over the known ratings and discard the unknown ratings whereas mostdeviation functions for bpo data sum over all possible user-item pairs, which are easily100 times more numerous.

For rating data, stochastic gradient descent (SGD) is generally the numerical opti-mization algorithm of choice. A (local) minimum of D (S,R) is found when rD (S,R) =

0, which is the same asP

(u,i)Rui>0

rDui (S,R) = 0. SGD randomly samples training rat-

ings Rui > 0 and for each of them updates the parameters S(1)u⇤ and S(2)

⇤i in the directionof the parameters for which rDui (S,R) = 0. For example, a parameter S(1)

xy is updateaccording to the rule

S(1)xy S(1)

xy � ⌘@Dui (S,R)

@S(1)xy

,

in which ⌘ is the learning rate. A lower learning rate is more stable, but also slower.Ratings are sampled with replacement and every rating is typically used multipletimes on average, until a convergence criterium of choice is reached. However, whenthe summation over the known ratings,

P(u,i)

Rui>0

, is replaced by a summation over all

user item pairs,P

(u,i), and every ratings needs to be considered multiple times (on av-erage), SGD needs to perform approximately 100 times as much updates per iteration,which makes the algorithm less attractive for bpo data.

Therefore, algorithms for bpo data typically use a variant of the alternating leastsquares (ALS) method if the deviation function allows it [Koren et al. 2009; Hu et al.2008]. In this respect, the deviation functions 17 and 18 are appealing because theycan be minimized with a variant of the alternating least squares (ALS) method. Takefor example the deviation function from equation 17

D (S,R) =

X

u2U

X

i2IWui (Rui � Sui)

2+ �

⇣||S(1)||F + ||S(2)||F

⌘,

=

X

u2U

X

i2IWui

⇣Rui � S(1)

u⇤ S(2)⇤i

⌘2+ �

⇣||S(1)||F + ||S(2)||F

⌘.

As most deviation functions, this deviation function is non-convex in the parameterscontained in S(1) and S(2) and has therefore multiple local optima. However, if onetemporarily fixes the parameters in S(1), it becomes convex in S(2) and we can analyti-cally find updated values for S(2) that minimize this convex function and are thereforeguaranteed to reduce D (S,R). Subsequently, one can temporarily fix the parameters inS(2) and in the same way compute updated values for S(1) that are also guaranteed toreduce D (S,R). One can keep alternating between fixing S(1) and S(2) until a conver-gence criterium of choice is met. Hu et al. [Hu et al. 2008], Pan et al. [Pan et al. 2008]and Pan and Scholz [Pan and Scholz 2009] give a detailed descriptions of possible ALSprocedure. The description by Hu et al. contains optimizations for the case in whichmissing preferences are uniformly weighted. Pan and Scholz [Pan and Scholz 2009]describe optimizations that apply to a wider range of optimization schemes. These op-timizations outperform their earlier work-around by means of a bagging method [Pan

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

fix  –  solve  solve  –  fix  fix  –  solve  solve  –  fix  fix  –  solve  solve  –  fix  

…        

[Hu et al. 2008] [Pan et al. 2008]

[Pan and Scholz 2009] [Pilászy et al. 2010]

[Zhou et al. 2008] [Yao et al. 2014]

[Takàcs and Tikk 2012]

Page 132: Tutorial bpocf

SGD with Sampling if necessary

•  uniform pdf •  uniform pdf+ bagging •  pdf ~ popularity •  pdf ~ gradient size •  discard samples until large gradient is

encountered

[Rendle et al. 2009]

[Pan and Scholz 2009]

[Rendle and Freudenthaler 2014]

[Rendle and Freudenthaler 2014]

[Weston et al. 2013]

Page 133: Tutorial bpocf

Others

•  expectation maximization •  cyclic coordinate descent •  quadratic programming •  direct computation

•  Variational Inference

[Hofmann 2004, Hofmann 1999]

[Ning and Karypis 2012] [Christakopoulou and Karypis 2014]

[Aiolli 2014]

[Aiolli 2013] [Deshpande and Karypis 2004]

[Sigurbjörnsson and Van Zwol 2008] [Sarwar et al. 2001]

[Mobasher et al. 2001] [Lin et al. 2002]

[Sarwar et al. 2000] [Menezes et al. 2010]

[van Leeuwen and Puspitaningrum 2012] [Verstrepen and Goethals 2014] [Verstrepen and Goethals 2015]

[Koeningstein et al. 2012]

[Paquet and Koeningstein 2013]

Page 134: Tutorial bpocf

Agenda •  Introduction •  Algorithms •  Netflix

Page 135: Tutorial bpocf

References

1:36 K. Verstrepen et al.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. InProceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.

Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-book. Springer, 145–186.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommendersystems. Computer 8 (2009), 30–37.

Weiyang Lin, Sergio A Alvarez, and Carolina Ruiz. 2002. Efficient adaptive-support association rule miningfor recommender systems. Data mining and knowledge discovery 6, 1 (2002), 83–105.

Hao Ma. 2013. An experimental study on implicit social recommendation. In Proceedings of the 36th inter-national ACM SIGIR conference on Research and development in information retrieval. ACM, 73–82.

Guilherme Vale Menezes, Jussara M Almeida, Fabiano Belem, Marcos Andre Goncalves, Anısio Lacerda,Edleno Silva De Moura, Gisele L Pappa, Adriano Veloso, and Nivio Ziviani. 2010. Demand-driven tagrecommendation. In Machine Learning and Knowledge Discovery in Databases. Springer, 402–417.

Bamshad Mobasher, Honghua Dai, Tao Luo, and Miki Nakagawa. 2001. Effective personalization based onassociation rule discovery from web usage data. In Proceedings of the 3rd international workshop onWeb information and data management. ACM, 9–15.

Xia Ning and George Karypis. 2011. Slim: Sparse linear methods for top-n recommender systems. In DataMining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 497–506.

Rong Pan and Martin Scholz. 2009. Mind the gaps: weighting the unknown in large-scale one-class col-laborative filtering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 667–676.

Rong Pan, Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. 2008. One-class collaborative filtering. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on.IEEE, 502–511.

Ulrich Paquet and Noam Koenigstein. 2013. One-class collaborative filtering with random graphs. In Pro-ceedings of the 22nd international conference on World Wide Web. International World Wide Web Con-ferences Steering Committee, 999–1008.

Istvan Pilaszy, David Zibriczky, and Domonkos Tikk. 2010. Fast als-based matrix factorization for explicitand implicit feedback datasets. In Proceedings of the fourth ACM conference on Recommender systems.ACM, 71–78.

Steffen Rendle and Christoph Freudenthaler. 2014. Improving pairwise learning for item recommendationfrom implicit feedback. In Proceedings of the 7th ACM international conference on Web search and datamining. ACM, 273–282.

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesianpersonalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncer-tainty in Artificial Intelligence. AUAI Press, 452–461.

Jasson DM Rennie and Nathan Srebro. 2005. Fast maximum margin matrix factorization for collaborativeprediction. In Proceedings of the 22nd international conference on Machine learning. ACM, 713–719.

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2000. Analysis of recommendation al-gorithms for e-commerce. In Proceedings of the 2nd ACM conference on Electronic commerce. ACM,158–167.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:35

rD(S,R) = rX

u2U

X

i2IDui(S,R) =

X

u2U

X

i2IrDui(S,R)

rD(S,R) = rX

u2U

X

i2IRui=1

X

j2IDuij(S,R) =

X

u2U

X

i2IRui=1

X

j2IrDuij(S,R)

=

Z�( ) · p( | ) · d

D(S,R) = DKL(Q(S)||p(S|R))

. . .

max for every (u, i)

max log p(S|R)

max log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

� log

Y

u2U

Y

i2IS↵Rui

ui (1 � Sui)

�X

u2U

X

i2I↵Rui logSui + log(1 � Sui) + �

⇣||S(1)||2F + ||S(2)||2F

REFERENCESFabio Aiolli. 2013. Efficient top-n recommendation for very large scale binary rated datasets. In Proceedings

of the 7th ACM conference on Recommender systems. ACM, 273–280.Fabio Aiolli. 2014. Convex AUC optimization for top-N recommendation with implicit feedback. In Proceed-

ings of the 8th ACM Conference on Recommender systems. ACM, 293–296.Sarabjot Singh Anand and Bamshad Mobasher. 2007. Contextual recommendation. Springer.C.M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY.Evangelia Christakopoulou and George Karypis. 2014. Hoslim: Higher-order sparse linear method for top-n

recommender systems. In Advances in Knowledge Discovery and Data Mining. Springer, 38–49.Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-

n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM,39–46.

Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. ACM Trans-actions on Information Systems (TOIS) 22, 1 (2004), 143–177.

Christian Desrosiers and George Karypis. 2011. A comprehensive survey of neighborhood-based recommen-dation methods. In Recommender systems handbook. Springer, 107–144.

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linearmodels via coordinate descent. Journal of statistical software 33, 1 (2010), 1.

Eric Gaussier and Cyril Goutte. 2005. Relation between PLSA and NMF and implications. In Proceedingsof the 28th annual international ACM SIGIR conference on Research and development in informationretrieval. ACM, 601–602.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 136: Tutorial bpocf

References

1:36 K. Verstrepen et al.

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual interna-tional ACM SIGIR conference on Research and development in information retrieval. ACM, 50–57.

Thomas Hofmann. 2004. Latent semantic models for collaborative filtering. ACM Transactions on Informa-tion Systems (TOIS) 22, 1 (2004), 89–115.

Frank Hoppner. 2005. Association Rules. In The Data Mining and Knowledge Discovery Handbook, OdedMainmon and Lior Rokach (Eds.). Springer, New York, NY.

Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. InData Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, 263–272.

Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender sys-tems: an introduction. Cambridge University Press.

Santosh Kabbur and George Karypis. 2014. NLMF: NonLinear Matrix Factorization Methods for Top-NRecommender Systems. In Data Mining Workshop (ICDMW), 2014 IEEE International Conference on.IEEE, 167–174.

Santosh Kabbur, Xia Ning, and George Karypis. 2013. Fism: factored item similarity models for top-n rec-ommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 659–667.

Noam Koenigstein, Nir Nice, Ulrich Paquet, and Nir Schleyen. 2012. The Xbox recommender system. InProceedings of the sixth ACM conference on Recommender systems. ACM, 281–284.

Yehuda Koren and Robert Bell. 2011. Advances in collaborative filtering. In Recommender systems hand-book. Springer, 145–186.

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommendersystems. Computer 8 (2009), 30–37.

Weiyang Lin, Sergio A Alvarez, and Carolina Ruiz. 2002. Efficient adaptive-support association rule miningfor recommender systems. Data mining and knowledge discovery 6, 1 (2002), 83–105.

Hao Ma. 2013. An experimental study on implicit social recommendation. In Proceedings of the 36th inter-national ACM SIGIR conference on Research and development in information retrieval. ACM, 73–82.

Guilherme Vale Menezes, Jussara M Almeida, Fabiano Belem, Marcos Andre Goncalves, Anısio Lacerda,Edleno Silva De Moura, Gisele L Pappa, Adriano Veloso, and Nivio Ziviani. 2010. Demand-driven tagrecommendation. In Machine Learning and Knowledge Discovery in Databases. Springer, 402–417.

Bamshad Mobasher, Honghua Dai, Tao Luo, and Miki Nakagawa. 2001. Effective personalization based onassociation rule discovery from web usage data. In Proceedings of the 3rd international workshop onWeb information and data management. ACM, 9–15.

Xia Ning and George Karypis. 2011. Slim: Sparse linear methods for top-n recommender systems. In DataMining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 497–506.

Rong Pan and Martin Scholz. 2009. Mind the gaps: weighting the unknown in large-scale one-class col-laborative filtering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 667–676.

Rong Pan, Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. 2008. One-class collaborative filtering. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on.IEEE, 502–511.

Ulrich Paquet and Noam Koenigstein. 2013. One-class collaborative filtering with random graphs. In Pro-ceedings of the 22nd international conference on World Wide Web. International World Wide Web Con-ferences Steering Committee, 999–1008.

Istvan Pilaszy, David Zibriczky, and Domonkos Tikk. 2010. Fast als-based matrix factorization for explicitand implicit feedback datasets. In Proceedings of the fourth ACM conference on Recommender systems.ACM, 71–78.

Steffen Rendle and Christoph Freudenthaler. 2014. Improving pairwise learning for item recommendationfrom implicit feedback. In Proceedings of the 7th ACM international conference on Web search and datamining. ACM, 273–282.

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesianpersonalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncer-tainty in Artificial Intelligence. AUAI Press, 452–461.

Jasson DM Rennie and Nathan Srebro. 2005. Fast maximum margin matrix factorization for collaborativeprediction. In Proceedings of the 22nd international conference on Machine learning. ACM, 713–719.

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2000. Analysis of recommendation al-gorithms for e-commerce. In Proceedings of the 2nd ACM conference on Electronic commerce. ACM,158–167.

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.

Page 137: Tutorial bpocf

References Binary, Positive-Only Collaborative Filtering: A Theoretical and Experimental Comparison of the State Of The Art1:37

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filteringrecommendation algorithms. In Proceedings of the 10th international conference on World Wide Web.ACM, 285–295.

Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Nuria Oliver, and Alan Hanjalic. 2012.CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering. In Proceedings ofthe sixth ACM conference on Recommender systems. ACM, 139–146.

Yue Shi, Martha Larson, and Alan Hanjalic. 2014. Collaborative filtering beyond the user-item matrix: Asurvey of the state of the art and future challenges. ACM Computing Surveys (CSUR) 47, 1 (2014), 3.

Borkur Sigurbjornsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowl-edge. In Proceedings of the 17th international conference on World Wide Web. ACM, 327–336.

Vikas Sindhwani, Serhat S Bucak, Jianying Hu, and Aleksandra Mojsilovic. 2010. One-class matrix comple-tion with low-density factorizations. In Data Mining (ICDM), 2010 IEEE 10th International Conferenceon. IEEE, 1055–1060.

Nathan Srebro, Jason Rennie, and Tommi S Jaakkola. 2004. Maximum-margin matrix factorization. InAdvances in neural information processing systems. 1329–1336.

Gabor Takacs and Domonkos Tikk. 2012. Alternating least squares for personalized ranking. In Proceedingsof the sixth ACM conference on Recommender systems. ACM, 83–90.

Lyle H Ungar and Dean P Foster. 1998. Clustering methods for collaborative filtering. In AAAI workshop onrecommendation systems, Vol. 1. 114–129.

Matthijs van Leeuwen and Diyah Puspitaningrum. 2012. Improving tag recommendation using few associ-ations. In Advances in Intelligent Data Analysis XI. Springer, 184–194.

Koen Verstrepen and Bart Goethals. 2014. Unifying nearest neighbors collaborative filtering. In Proceedingsof the 8th ACM Conference on Recommender systems. ACM, 177–184.

Koen Verstrepen and Bart Goethals. 2015. Top-N recommendation for Shared Accounts. In Proceedings ofthe 9th ACM Conference on Recommender systems. ACM.

Jason Weston, Samy Bengio, and Nicolas Usunier. 2011. Wsabie: Scaling up to large vocabulary imageannotation. In IJCAI, Vol. 11. 2764–2770.

Jason Weston, Ron J Weiss, and Hector Yee. 2013a. Nonlinear latent factorization by embedding multipleuser interests. In Proceedings of the 7th ACM conference on Recommender systems. ACM, 65–68.

Jason Weston, Hector Yee, and Ron J Weiss. 2013b. Learning to rank recommendations with the k-orderstatistic loss. In Proceedings of the 7th ACM conference on Recommender systems. ACM, 245–248.

Yuan Yao, Hanghang Tong, Guo Yan, Feng Xu, Xiang Zhang, Boleslaw K Szymanski, and Jian Lu. 2014.Dual-regularized one-class collaborative filtering. In Proceedings of the 23rd ACM International Confer-ence on Conference on Information and Knowledge Management. ACM, 759–768.

Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Large-scale parallel collaborativefiltering for the netflix prize. In Algorithmic Aspects in Information and Management. Springer, 337–348.

Received February 2014; revised March 2015; accepted June 2015

ACM Computing Surveys, Vol. 1, No. 1, Article 1, Publication date: January 2015.