Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

20
Rakesh Gupta 1 , Chris Sneed 1 ,Vipul Tyagi 1 1 College of Computing and Technology, Lipscomb University, Nashville, TN, USA Predicting Online Purchases Using Conversion Prediction Modeling 1

Transcript of Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Page 1: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Rakesh Gupta1, Chris Sneed1,Vipul Tyagi1

1College of Computing and Technology, Lipscomb University, Nashville, TN, USA

Predicting Online Purchases Using Conversion Prediction Modeling

1

Page 2: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Executive Summary• Homesite Group Inc. sponsored a Kaggle* competition to

understand how they could better predict what price will entice it’s quote seekers to purchase a home insurance policy.

• The outcome of this research will be important to the field of retail sales, with special importance to online sales

• The benefits of this implementation for Homesite are more sales from its leads through effective product pricing.

• In this presentation, our team will demonsrate the process we followed to create the model and our results in predicting the data

*https://www.kaggle.com/c/homesite-quote-conversion

Page 3: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

*U.S. Census Bureau News. Quarterly Retail E-Commerce Sales for 1st Quarter 2016. (May, 2016).

*

Page 4: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Sales Lead Articles History

Predictive Models

Sales and Lead Cycle Research

Sales Pricing Models

Classification Algorithms

Naïve Bayes

Neural Networks

Binary Logistic Regression

AdaBoost

Patents

Sales Lead Prioritization

Lead Conversion

Predicting Online Purchases – A Comparison of Machine Learning Approaches

Dynamic Pricing

Sales Lead Conversion

Weighted KNN

Gradient Boosting

Decision Trees

CART

C5.0

CHAID

Support Vector Machines

Page 5: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Patents

Page 6: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Decision Trees

CART

C5.0

Naïve Bayes

Neural Networks

Binary Logistic Regression

AdaBoost

Weighted KNN

Gradient Boosting

CHAID

Support Vector Machines

Classification Algorithms

Page 7: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Data Source Analysis• Data from Homesite was relatively clean to begin with• The dataset had 299 predictor variables and one target variable:

“QuoteConversion” Flag. – Target variable has the values : 0 or 1

• Data collected had a train dataset of 260K records and test dataset of 173K records

• During analysis, we removed the variable “QuoteDate” and the following variables:

Summary StatisticsVariable Name GeographicField10A GeographicField10B PersonalField84 PropertField29 PropertyField6

Min -1 -1 1 0 01st Quartile -1 25 2 0 0

Median -1 25 2 0 0Mean -1 25 1.99 0 0

3rd Quartile -1 25 2 0 0Max -1 25 8 10 0NAs 207020 334630

Page 8: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016
Page 9: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Data Cleansing & Preparation

• Categorical variables conversion to numeric– 27 variables converted

• 293 predictor variables in the full training set• Multiple split ratios of train/test

– 90/10– 80/20– 67/33

• Randomized sample• Multiple iterations

Page 10: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Classifications & Platforms

• R – open source statistical tool– Naïve Bayes– Logistic Regression– Boosting

• Python – open source programming platform– Naïve Bayes– kNN– Logistic Regression

Page 11: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Naïve Bayes*

• Naive Bayes is a simple technique for constructing classifiers.• Models that assign class labels to problem instances, represented as

vectors of feature values. • All naive Bayes classifiers assume that the value of a particular feature

is independent of the value of any other feature, given the class variable.• The method of maximum likelihood is applied for parameter estimation

for naive Bayes models.• Despite the naive design and apparently oversimplified assumptions,

naive Bayes classifiers have worked quite well in many complex real-world situations.

• An advantage of naive Bayes is that it only requires a small amount of training data to estimate the parameters necessary for classification.

• Our team used Gaussian Naïve Bayes as it is good for continuous data

*Naïve Bayes classifier. (n.d.). In Wikipedia. Retrieved fromhttps://en.wikipedia.org/wiki/Naive_Bayes_classifier

Page 12: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Logistic Regression*

•Binary logistic regression as our target variable is 0 or 1

•Predicts probabilities of dependent variable

*Logistic Regression. (n.d.). In Wikipedia. Retrieved fromhttps://en.wikipedia.org/wiki/Logistic_regression

Page 13: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

kNN*• An object is classified by a majority vote of its neighbors, assigning it to

the “nearest” neighbor• The nearer neighbors contribute more to the average than the distant

ones• Sensitive to the local structure of the data

*k-nearest neighbors algorithm. (n.d.). In Wikipedia. Retrieved fromhttps://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

Page 14: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Boosting*•Boosting is a general method for improving the accuracy of any given learning algorithm

•Works by combining rough and less than accurate rules of thumb• Produce a classifier with a

low generalization error• Increase weights on

incorrectly classified examples, forcing the base learner to focus it’s attention on them

*Schapire, Robert E. and Freund, Yoav. Boosting: Foundations and Algorithms. Massachusetts Institute of Technology, Cambridge, MA. 2012

Page 15: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Trials & Tribulations• Neural

Networks?• CSV

Vector?

Mahout

• Output of model

• Learning curve

RapidMiner

• Complicated to fit model

SVM

• VIF Functions

• Corrgrams*

Multicollinearity Analysis

*Package ‘corrgram’ Retrieved from https://cran.r-project.org/web/packages/corrgram/corrgram.pdf

Page 16: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Correlation Analysis

*Package ‘corrgram’ Retrieved from https://cran.r-project.org/web/packages/corrgram/corrgram.pdf

Page 17: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Results - Accuracy Matrices“No Models are perfect, but some are better than others…”

Technology ClassifierNaïve Bayes KNN

Logistic Regression

HS Test File0’s1’s

PythonSplit Ratio

90/10 81% 78.34% 81.33%0’s = 168422

1’s = 5414

PythonSplit Ratio

80/20   78.47% 81.15%0’s = 165859

1’s = 7977

PythonSplit Ratio

67/33   78.64% 81.13%0’s = 165870

1’s = 7966

RSplit Ratio

80/20 71%

RSplit Ratio

80/20

0’s = 124,544

1’s = 49,292

Page 18: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Conclusion & Discussion• Boosting helped identify the 6 variables that provided the

most value• We know we can predict a sale from a lead about 80% of the

time given Homesite’s data set

• We reduced the number of predictor values from 292 to 6!• This allows Homesite to focus on these data points.• Following the 80/20 Pareto principle – From these 6

predictors we get 80% of the benefit without wasting time on the other factors that don’t carry as much weight.

• Simple, fast market strategy that will provide immediate benefits in terms of increased sales and revenue for Homesite

Page 19: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Future Works• Continue work on additional data cleaning to

improve accuracy of the model from 81% to 97%• Investigate the use of the remaining classification

models to see if we achieve better results• Design and build a process to provide real-time

prediction as new quotes are sent out by HomeSite.

• Complete ANOVA analysis to determine strength of logistic regression model

Page 20: Presentation - Predicting Online Purchases Using Conversion Prediction Modeling 8.19.2016

Questions?