Data mining to improve e-mail marketing

28
Tayko Smart Marketing using analytics

Transcript of Data mining to improve e-mail marketing

Page 1: Data mining to improve e-mail marketing

TaykoSmart Marketing using analytics

Page 2: Data mining to improve e-mail marketing

Business Problem

Tayko is a software catalog firm that sells games and educational software

Want to market a new collection using e-mail marketing. As member of an industry consortium, they can pull 2,00,000 emails

address from the central repository of the consortium. To maximize the benefit, Tayko wants to pull records with high

probability of response and higher value of sale.

Page 3: Data mining to improve e-mail marketing

Analytics Problem

1. Create a classification model to groups the customer as responder or purchasers(1) and non-responders or non-purchasers(0).

2. Create a prediction model to predict the value of sale of the responder(1).

Page 4: Data mining to improve e-mail marketing

Data Collection

Supervised learning techniques is to be applied as a desired output is required is already defined.

A sample of 2000 customer is drawn form the central repository and test e-mail marketing is done.

The 2 target variables : Purchased and Spending is recorded for the sample.

The result showed 1000 purchasers and 1000 non-purchasers

Page 5: Data mining to improve e-mail marketing

Data partitioning

The data set is partitioned into Training set – 60% - 1200 records Testing – 20% - 400 records Validation – 20% - 400 records

Page 6: Data mining to improve e-mail marketing

Initial StudyWhat kind of variables are present.

Page 7: Data mining to improve e-mail marketing

Finding the variables with strong differentiation power – Nominal Variables

Use of Catalog A, T, U, P show high percentage of people making a purchase

Use of Catalog O, H show high percentage of people not making a purchase

But only Catalog A & U has been used for more than 100 customers. Catalog H for more than 50 customers & rest below 50 customers. Distribution of catalogs were not even.

Page 8: Data mining to improve e-mail marketing

Other Nominal Variables

Out of other categorical variables : “Order Online” is the only one which show some power to differentiate between customer who purchased and the non-purchasers.

Page 9: Data mining to improve e-mail marketing

Ordinal Variables Number of purchase last year shows a good trend People who have not made any purchase last year

have not made any purchase with the new catalogs also.

People who had made more than 3 purchase has surly made a purchase this time also

Page 10: Data mining to improve e-mail marketing

Scale Variables

Out of the 2 scale variables “Last update to customer record” shows a significant difference in their mean.

Page 11: Data mining to improve e-mail marketing

Target Variables

Purchaser and non-purchasers are equally distributed However the sales value or the amount spend by customer follows a

non-normal distribution

Page 12: Data mining to improve e-mail marketing

ClassificationWho will make a purchase?

Page 13: Data mining to improve e-mail marketing

Logistic Regression – Training

Final set of variables1. Frequency : Number of transactions in last year at

source catalog 2. Web Order : Customer placed at least 1 order via

web 3. Address is Residence : Address is a residence 4. Source_a, h or u :Source Catalog is A, U or H

Page 14: Data mining to improve e-mail marketing
Page 15: Data mining to improve e-mail marketing

Logistic Regression – Testing & Validation

Test Over-all accuracy : 80%

Validation Over-all accuracy : 77%

Page 16: Data mining to improve e-mail marketing

Decision Tree – Training CHAID Growing method gave best results

Page 17: Data mining to improve e-mail marketing
Page 18: Data mining to improve e-mail marketing

Decision Tree – Test & Validate Test

Over-all accuracy : 76%

Validation Over-all accuracy : 74%

Page 19: Data mining to improve e-mail marketing

Result

Logistic regression gives a better result than decision tree

Page 20: Data mining to improve e-mail marketing

PredictionHow much a purchaser will spend?

Page 21: Data mining to improve e-mail marketing

New Calculated Variables

• High correlation between “last_update_days_ago ” and “1st_update_days_ago ”• New calculated variable DayDiff which is difference of

the 2 variables

Page 22: Data mining to improve e-mail marketing

Multiple Linear Regression

Pre-processiong Univariate analysis and transformation of Target Variable “Spend”

Outlier removal, Filtering and Transformation

Page 23: Data mining to improve e-mail marketing

Model & Performance

4 models are generated Case 1 : None Residence Address & Not a Web-Order (R-sqr : 0.569 & Adj R-sqr :

0.566)Spending = -15.733 + 79.11 * No of transaction last year – 47.825 * Catalog D + 30.632 * Catalog U Case 2 : None Residence Address & Web-Order (R-sqr : 0.62 & Adj R-sqr : 0.616)Spending = -42.285 + 115.976 * No of transaction last year + 45.506 * Catalog U -247.655 * Catalog H + 55.605 Catalog R Case 3 : Residence Address & Not a Web-Order (R-sqr : 0.516 & Adj R-sqr : 0.507)Spending = -26.965 + 69.218 * No of transaction last year + 66.219 * Catalog U – 113.587*Catalog H Case 4 : Residence Address & Web-Order (R-sqr : 0.612 & Adj R-sqr : 0.592)Spending = -4.616 + 65.114 * No of transaction last year - 111.934*Catalog H – 81.28 * Catalog R – 129.754 * Catalog C + 66.242 * Catalog A

Page 24: Data mining to improve e-mail marketing

MAD & MAPE

Training MAD : 68.89 MAPE : 103%

Test MAD : 104.53 MAPE : 109%

Validation MAD : 104.03 MAPE : 101%

Page 25: Data mining to improve e-mail marketing

Regression Tree Exhaustive CHAID

Page 26: Data mining to improve e-mail marketing

MAD & MAPE

Training MAD : 105.37 MAPE : 95%

Test MAD : 121.54 MAPE : 103%

Validation MAD : 121.31 MAPE : 113%

Page 27: Data mining to improve e-mail marketing

Decision

Both the models are very weak in predicting the amount spent There is high error for evaluation indicators. One major reason for this can be the lack of scale variables and high

correlation between whatever scale variables are given. Since most variables are of nominal type, converting the prediction

problem to classification might produce better result. But it was out of scope for the given problem.

Page 28: Data mining to improve e-mail marketing

Conclusion

The classification of customer into purchasers and non-purchasers shows good result and the elected logistic regression model is expected to show high performance in live situation also.

However the prediction models show weak performance and a high degree of error is expected if used in the current state.