Customer Behaviour Analysis

29
PREDECTIVE MODELING SZP0052 RESEARCH QUESTION: Zappos.com is an online shoe and clothing shop currently based in Las Vegas Nevada. Zappos.com wants to know who is coming to their website and what they do when visiting there website. They want to find efficient ways to improve there sales by analyzing there customer base using factors such as the platform they use, most visited site, product page views, visits, orders etc. Aim of the project: To answer the following questions: 1. Who is coming to there website and what they do when visiting there website? 2. Do they buy a product or just visit the website? 3. Do they just view or search the product page? 4. Which platform do they normally use? 5. Which site do they normally visit? 6. Do they just search for the product or even buy it? 7. Develop a model to forecast there gross sales?

Transcript of Customer Behaviour Analysis

Page 1: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

RESEARCH QUESTION: Zappos.com is an online shoe and clothing shop currently based in Las Vegas Nevada. Zappos.com wants to know who is coming to their website and what they do when visiting there website. They want to find efficient ways to improve there sales by analyzing there customer base using factors such as the platform they use, most visited site, product page views, visits, orders etc. Aim of the project: To answer the following questions: 1. Who is coming to there website and what they do when visiting there website? 2. Do they buy a product or just visit the website? 3. Do they just view or search the product page? 4. Which platform do they normally use? 5. Which site do they normally visit? 6. Do they just search for the product or even buy it? 7. Develop a model to forecast there gross sales?

Page 2: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

Data Visualization: I. Site Vs Sales

Page 3: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

Observations: From the above plots it is clear that the most visited site is Acme, followed by pinnacle and sorty. Even though Pinnacle and sorty have substantial visits they do not correspond to sales. Thus we can say that customers visiting pinnacle and sorty do no produce substantial sales to the company. Where as the visits of Acme translate into sales, thus we can say that customers visiting the acme produce substantial sales to the company.

II. Customer Vs sales 0 – old Customer 1- New customer

Page 4: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

Observations: From the above graphs it is clear that most of the people who visit the site are old customers and old customers contribute significantly more to the company sales as compared to new customers. Though the difference between old customers and new customers who visit the site does not appear to be significantly different.

Page 5: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

III. Platform Vs Sales

Page 6: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

Observations: From the plots we can notice that of all the platforms most of the customers who visit the company site use ios followed by android and windows. It should also be noted that though most of the customers use ios and android, it is the windows and Mac OS X that contribute significantly to companies’ sales. Though ios is used my majority of the customers it does not reflect the sales of the company. It can also be noted that majority of the customers who use Mac OS X and windows visit the site Acme.

Page 7: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

IV. ORDERS Vs Sales

Page 8: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

Observations: From the above graphs it can be noted that on majority of the orders are between 0-200. It can clearly be observed that there is linear relation between orders and sales and majority of the orders to Acme site from the platform windows.

Page 9: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052 V. Product page Views Vs Sales

Page 10: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

OBSERVATIONS: It can be noted that majority of the project page views comes from the windows platform followed by iOS and Mac OSX. It can also be observed that majority of the sales for the site widgetry comes from the iOS users. It also appears that there is a proportional relation between product page views and sales, which indicate that the customers who are searching for the product are actually buying it.

Page 11: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

VI. Search page views Vs Sales

Page 12: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

Observations: It can be observed that majority of the sales come from Windows platform, followed by MacOSX and iOS. It can also be noted that and Acme is the most searched site, as we have seen earlier that Acme also has maximum sales indicating that people search page views is proportional to sales.

Page 13: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

VII. Add to cart Vs Sales

Page 14: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

Observations: It can be observed that most of the products are added to cart from the Acme site and the most used platform is Windows followed by MacOSX and iOS. There also appears to be significant relations between add to cart and sales.

Page 15: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

VIII. Conversion Rate Vs Sales

Page 16: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

Observations: It can be observed that conversion rate is pretty good for Acme, Pinnacle and sortly. Golden ratio: add_to_cart/Orders, we want this to be as low as possible. It can be noted that it is pretty high for Tabular and Widgetry. MODELLING Checking for Assumptions: Normality of Response:

Page 17: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

As the data does not follow normal distribution we need to transform it, here we use log transformation to transform the data to Normal. AFTER log Transformation:

The data is now normally distributed, so we can fit linear regression. FITTING SIMPLE LINEAR REGRESSION

1. lm(formula = log(gross_sales) ~ visits, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.435e+00 2.510e-02 256.33 <2e-16 *** visits 2.109e-04 7.122e-06 29.61 <2e-16 *** Multiple R-squared: 0.07096 We can that the predictor visits is significant.

Page 18: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

2. lm(formula = log(gross_sales) ~ platform, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.31492 0.15614 21.231 < 2e-16 *** platformAndroid 3.14734 0.16345 19.255 < 2e-16 *** platformBlackBerry -0.05144 0.18442 -0.279 0.780321 platformChromeOS 2.15667 0.18148 11.884 < 2e-16 *** platformiOS 5.21251 0.16255 32.068 < 2e-16 *** platformiPad 4.18013 0.20332 20.560 < 2e-16 *** platformiPhone 3.94906 0.19845 19.900 < 2e-16 *** platformLinux 1.86229 0.17273 10.782 < 2e-16 *** platformMacintosh 4.47188 0.22052 20.278 < 2e-16 *** platformMacOSX 3.97496 0.16676 23.837 < 2e-16 *** platformOther 3.10501 0.22714 13.670 < 2e-16 *** platformUnknown 0.16862 0.17547 0.961 0.336587 platformWindows 4.37739 0.16533 26.477 < 2e-16 *** WindowsPhone 0.61178 0.18497 3.308 0.000944 *** Multiple R-squared: 0.3744

x We can see that all the platforms except Blackberry and Unknown are significant and have significant effect on sales.

3. lm(formula = log(gross_sales) ~ site, data = sales) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.20121 0.03615 199.228 <2e-16 *** siteBotly -0.09904 0.11445 -0.865 0.387 sitePinnacle -1.47050 0.06326 -23.245 <2e-16 *** siteSortly -1.93302 0.06044 -31.983 <2e-16 *** siteTabular 1.55513 0.11445 13.588 <2e-16 *** siteWidgetry 1.49026 0.11445 13.021 <2e-16 ***

Page 19: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

Multiple R-squared: 0.1546 x It can be noted that there is no significant difference between sales of Acme and

other sites except for Botly.

4. lm(formula = log(gross_sales) ~ new_customer, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.15069 0.03493 204.69 <2e-16 *** new_customer -1.02895 0.05023 -20.48 <2e-16 *** Multiple R-squared: 0.03594

x It can be noted that there is a significant relation between new customer and sales.

5. lm(formula = log(gross_sales) ~ bouncce_rate, data = sales) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.26550 0.03628 200.26 <2e-16 *** bouncce_rate -1.76009 0.13379 -13.16 <2e-16 *** Multiple R-squared: 0.01606

It can be noted that predictor bounce_rate is significant and has impact on sales.

Page 20: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

6. lm(formula = log(gross_sales) ~ conversion_rate, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.20423 0.03559 174.34 <2e-16 *** conversion_rate 1.84597 0.06775 27.25 <2e-16 *** Multiple R-squared: 0.06544

x It can be noted that predictor conversion_rate is significant and has impact on sales.

7. lm(formula = log(gross_sales) ~ add_to_cart_rate, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.11271 0.04414 138.50 <2e-16 *** add_to_cart_rate 1.64091 0.07503 21.87 <2e-16 *** Multiple R-squared: 0.04317

x It can be noted that predictor add_to_cart_rate is significant and has impact on sales.

8. lm(formula = log(gross_sales) ~ product_page_views, data = sales) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.262e+00 2.386e-02 262.42 <2e-16 *** product_page_views 1.440e-04 2.810e-06 51.26 <2e-16 *** Multiple R-squared: 0.1862

x It can be noted that predictor product_page_views is significant and has significant impact on sales.

Page 21: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

9. lm(formula = log(gross_sales) ~ search_page_views, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.329e+00 2.429e-02 260.56 <2e-16 *** search_page_views 5.917e-05 1.339e-06 44.17 <2e-16 *** Multiple R-squared: 0.1452

x It can be noted that predictor search_page_views is significant and has impact

on sales.

10. lm(formula = log(gross_sales) ~ distinct_sessions, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.463e+00 2.527e-02 255.81 <2e-16 *** distinct_sessions 2.425e-04 9.475e-06 25.59 <2e-16 *** Multiple R-squared: 0.05396

x The predictor distinct_sessions is significant and has impact on sales.

Page 22: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052 FITTING MULTIPLE REGRESSION

ASSUMPTIONS: ¾ Checking for Multi-collinearity:

New_ customers and Visits: new_customer visits new_customer 1.0000000 -0.2228282 visits -0.2228282 1.0000000 Visits and distinct_sessions: visits distinct_sessions visits 1.0000000 0.9953069 distinct_sessions 0.9953069 1.0000000 There is high correlation between visits and distinct sessions. Visits and orders: visits orders Visits 1.0000000 0.2507898 orders 0.2507898 1.0000000 Visits and Bounces: visits bounces visits 1.0000000 0.8987508 bounces 0.8987508 1.0000000 There is a high correlation between visits and bounces.

Page 23: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

Visits and add_to_cart visits add_to_cart visits 1.0000000 0.8773295 add_to_cart 0.8773295 1.0000000 There is a high correlation between visits and add_to_cart. Visits and Product_page_views: visits product_page_views visits 1.0000000 0.9445481 product_page_views 0.9445481 1.0000000 There is a high correlation between visits and product_page_views. Visits and search_page_views visits search_page_views visits 1.0000000 0.9444395 search_page_views 0.9444395 1.0000000 There is a high correlation between and visits and search_page_views.

Page 24: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

¾ CHECK FOR NORMALITY:

¾ CONSTANT VARIENCE OF RESIDUALS:

Page 25: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

Multiple Regressions: ¾ lm(formula = log(gross_sales) ~ visits + conversion_rate + add_to_cart_rate +

new_customer + platform, data = sales)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.249e+00 4.798e-02 130.251 < 2e-16 visits 6.258e-04 7.980e-06 78.427 < 2e-16 conversion_rate 2.141e+00 1.071e-01 19.984 < 2e-16 add_to_cart_rate -4.336e-01 1.179e-01 -3.678 0.000237 new_customer -8.004e-01 3.718e-02 -21.530 < 2e-16 platformBlackBerry -2.513e+00 9.489e-02 -26.479 < 2e-16 platformChromeOS -5.906e-01 8.080e-02 -7.309 2.88e-13 platformiOS 1.416e+00 4.930e-02 28.733 < 2e-16 platformiPad 8.268e-01 1.034e-01 7.994 1.45e-15 platformiPhone 6.331e-01 9.620e-02 6.582 4.88e-11 platformLinux -7.034e-01 6.904e-02 -10.189 < 2e-16 platformMacintosh 9.018e-01 1.235e-01 7.301 3.06e-13 platformMacOSX 2.464e-01 5.836e-02 4.222 2.44e-05 platformOther 3.920e-01 1.301e-01 3.013 0.002596 platformUnknown -2.363e+00 7.866e-02 -30.041 < 2e-16 platformWindows 2.126e-01 5.634e-02 3.775 0.000161 platformWindowsPhone -2.020e+00 8.694e-02 -23.237 < 2e-16 Multiple R-squared: 0.617

Page 26: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

POISSON REGRESSION: ¾ ESTIMATION OF DISPERSION PARAMETER:

Dispersion Parameter = 14281.59 1) Without Dispersion Parameter:

glm(formula = gross_sales ~ visits + conversion_rate + add_to_cart_rate + new_customer + platform, family = poisson, data = sales) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 7.391e+00 6.381e-04 11582.75 <2e-16 *** visits 2.196e-04 1.433e-08 15324.96 <2e-16 *** conversion_rate 7.268e-01 1.012e-03 718.55 <2e-16 *** add_to_cart_rate -1.784e-01 1.146e-03 -155.74 <2e-16 *** new_customer -8.423e-01 2.506e-04 -3360.61 <2e-16 *** platformBlackBerry -3.384e+00 7.851e-03 -430.99 <2e-16 *** platformChromeOS -9.284e-01 2.057e-03 -451.44 <2e-16 *** platformiOS 2.188e+00 6.116e-04 3578.20 <2e-16 *** platformiPad 2.109e+00 7.841e-04 2689.56 <2e-16 *** platformiPhone 9.267e-01 1.091e-03 849.71 <2e-16 *** platformLinux -6.797e-01 1.523e-03 -446.39 <2e-16 *** platformMacintosh 2.703e+00 7.055e-04 3831.98 <2e-16 *** platformMacOSX 2.520e+00 6.178e-04 4078.42 <2e-16 *** platformOther 3.552e-02 2.224e-03 15.97 <2e-16 *** platformUnknown -2.525e+00 4.104e-03 -615.26 <2e-16 *** platformWindows 2.425e+00 6.177e-04 3925.73 <2e-16 *** platformWindowsPhone -2.707e+00 5.333e-03 -507.67 <2e-16 *** Dispersion parameter for poisson family taken to be 1 AIC: 143152546 Number of Fisher Scoring iterations: 10

Page 27: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

2) With Dispersion Parameter: Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 7.391e+00 7.626e-02 96.922 < 2e-16 *** visits 2.196e-04 1.712e-06 128.236 < 2e-16 *** conversion_rate 7.268e-01 1.209e-01 6.013 1.82e-09 *** add_to_cart_rate -1.784e-01 1.369e-01 -1.303 0.192514 new_customer -8.423e-01 2.995e-02 -28.121 < 2e-16 *** platformBlackBerry -3.384e+00 9.383e-01 -3.606 0.000310 *** platformChromeOS -9.284e-01 2.458e-01 -3.778 0.000158 *** platformiOS 2.188e+00 7.309e-02 29.942 < 2e-16 *** platformiPad 2.109e+00 9.371e-02 22.506 < 2e-16 *** platformiPhone 9.267e-01 1.303e-01 7.110 1.16e-12 *** platformLinux -6.797e-01 1.820e-01 -3.735 0.000187 *** platformMacintosh 2.703e+00 8.431e-02 32.065 < 2e-16 *** platformMacOSX 2.520e+00 7.383e-02 34.127 < 2e-16 *** platformOther 3.552e-02 2.657e-01 0.134 0.893680 platformUnknown -2.525e+00 4.904e-01 -5.148 2.63e-07 *** platformWindows 2.425e+00 7.382e-02 32.850 < 2e-16 *** platformWindowsPhone -2.707e+00 6.373e-01 4.248 2.16e-05 *** Dispersion parameter for poisson family taken to be 14281.59 AIC: 143152546

Page 28: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052

MODEL SELECTION: Among all the models Poisson regression was found to be the best model to forecast the gross sales as compared to simple linear regression and multiple linear regression. This is because our response variable was count data, which makes poisson regression a better model. In addition to that the response variable gross sales was not normally distributed, thus we had to log transform the data, which makes the interpretation of coefficients in a multiple regression model that has been log transformed difficult. Simple linear regression also cannot be used as there are many predictors and using only one predictor would lead to omitted variable bias. Multiple linear regression with all predictors also cannot be used as it will lead to multi collinearity, which will make our coefficients biased. Thus of all the models poisson regression was found to be the best model to forecast sales .

Page 29: Customer Behaviour Analysis

PREDECTIVE MODELING SZP0052 RESULTS:

x Among all the Zappos sites the most visited site is Acme followed by pinnacle

and sortly. x Most of there sales come from returning customers rather then new customers. x Most of Zappos users use ios platform when they visit their site. x Even though most of the visits come from ios, most of the sales are from

windows. x The site that is responsible for generating maximum gross sales is Acme. x Majority of the product page views are for Acme followed by Widgetry, It can be

noted that though widgetry has very high product page views it does not generate sales.

x Search page views and add to cart are also highest for Acme. x It is found that there is linear relation between product page views, visits and

sales indicating that the customers not just visit the site but also buy the product.