Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf ·...
Transcript of Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf ·...
![Page 1: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/1.jpg)
Data mining in advertising:air loyalty programs and
coupon purchase prediction
Dmitry Efimov
December 15, 2015
![Page 2: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/2.jpg)
Outline
Predicting value customers in Air Loyalty programs
Coupon purchase prediction
Data preprocessing
Features
Models for prediction
![Page 3: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/3.jpg)
About me
I PhD in Mathematics from Moscow State UniversityI The main specialization: machine learning and data miningI TOP-10 in the global ranking on kaggle.com
![Page 4: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/4.jpg)
Predicting value customers in Air Loyalty programs
I Data are provided by Etihad Air Company
I The problem is to predict passengers that change tierstatus during the next few weeks
I The suggested methodology is patented by Dmitry Efimovand Jose Berengueres
![Page 5: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/5.jpg)
![Page 6: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/6.jpg)
![Page 7: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/7.jpg)
Example of an application that enhances the value ofmiles program:
![Page 8: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/8.jpg)
Coupon purchase prediction
![Page 9: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/9.jpg)
Provided data
I Train: user-coupon purchases for 52 weeksI Train: user-coupon visits for 52 weeksI User list: gender, age, locations
I ≈ 20 000 usersI Coupon list: price, discount, genre name, locations
I train: ≈ 18 000 couponsI test: ≈ 400 coupons
Predict: user-coupon purchases for the 53rd week
![Page 10: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/10.jpg)
Cross validation
Problem: possible pairs user-coupon in train: ≈ 360 000 000
I To decrease the size of train, for each week:I take coupons with at least one purchaseI take users with at least one purchaseI it gives ≈ 600 000 pairs for each week, or ≈ 30 000 000
pairs for the whole trainI use last few weeks to predict test week (we used last 5
weeks)I Validation set: pairs for the last week
![Page 11: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/11.jpg)
Feature engineering
I Dummies: one-hot encoding
I Counts: number of samples for different feature values
I Counts unique: number of different values of one featurefor fixed value of another feature
I Likelihoods
I Similarities
![Page 12: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/12.jpg)
Likelihood features
I Type 1: using sliding window by weeks
Example:I for each week calculate the rate of purchases by each
GENRE NAME based on the previous 10 weeksI Type 2: using multi class algorithms
Example:I one purchase — sample with target GENRE NAMEI for the test week: predict what will be next GENRE NAMEI ⇒ XGBOOST with 13 classes
![Page 13: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/13.jpg)
Similarities
I the idea: for each user find the similarity between testcoupons and coupons purchased before
I use cosine distance with weights
I coordinates for each coupon — coupon features
![Page 14: Data mining in advertising: air loyalty programs and ...efimov-ml.com/pdfs/MCM353_2015.pdf · Coupon purchase prediction Data preprocessing Features Models for prediction. About me](https://reader034.fdocuments.in/reader034/viewer/2022043017/5fab229618201c63f9663edc/html5/thumbnails/14.jpg)
Final model and resutls
I XGBOOST with rank:map objective and map@10evaluation metric
Place Team Leaderboardscore
1 Herra Huu 0.0099732 Halla Yang 0.0098483 threecourse 0.009484... ... ...20 Dmitry and Leustagos 0.007642