gozips.uakron.edugozips.uakron.edu/.../datamanagement_assignment6.docx · Web viewIn this homework...

17
Christopher Knapp University of Akron, Fall 2011 Statistical Data Management Homework #6

Transcript of gozips.uakron.edugozips.uakron.edu/.../datamanagement_assignment6.docx · Web viewIn this homework...

Christopher KnappUniversity of Akron, Fall 2011Statistical Data Management

Homework #6

Problem Statement

In this homework assignment, I am expecting you to be able to merge data sets. Please refer to Assignment #2 for an explanation of the data set that has 42,694 customers who have purchased an item in the last 12 months. Unfortunately, this time the data was split into the following two data sets:

Data Set #1: Data set download Location: apps on 'Samba Server (R:)' > Fridline > Statistical Data

Management > Assignment #6 > Email Test1.csv

Data Set #2: Data set download Location: apps on 'Samba Server (R:)' > Fridline > Statistical Data

Management > Assignment #6 > Email Test2.csv

There is a separate data set that describes the e-mail campaign the customer received: 1/2 were randomly chosen to receive an e-mail campaign featuring Men’s merchandise. 1/2 were randomly chosen to receive an e-mail campaign featuring Women’s

merchandise.Observe the following dataset that describes the type of email received associated with each customer:

Data Set #3: Data set download Location: apps on 'Samba Server (R:)' > Fridline > Statistical Data

Management > Assignment #6 > Email Indicator.csv

a) Please merge both the Email Test1 and Email Test2 data sets into one file.

b) Merge the variable information from the Email Indicator dataset to the dataset in part #1.

c) Which e-mail campaign performed the best, the Men’s version, or the Women’s version? Why? Did the campaigns perform different when measured across different metrics, like Visitors, Conversion, and Total Spend?

Note: In SAS, please provide your program and all supporting output. In SPSS, please paste your syntax and all supporting output. In Microsoft Excel, explain how you achieved this merging task. Mention any formulas

that you used to complete this task. Also, please provide some supporting evidence of which campaign performed better using Excel features.

Contents

Merging in SAS

Page 1 EmailTest1 and EmailTest2

Page 2 Add EmailIndicator

Page 3 SAS Code

Merging in SPSS

Page 4 EmailTest1 and EmailTest2

Page 5 Add EmailIndicator

Page 6 SPSS Code

Merging in Excel

Page 7 EmailTest1 and EmailTest2

Page 8 Add EmailIndicator

Performance Analysis

Page 9 Men’s Versus Women’s Version

A s s i g n m e n t 6 M e r g i n g i n S A S P a g e | 1

Merging EmailTest1 and EmailTest2 into SAS

The following table displays the first step in the merging process. Notice there are 42694 observations in this table. The SAS code can be found in the third header of this section, labeled “SAS Code”.

A s s i g n m e n t 6 M e r g i n g i n S A S P a g e | 2

Add EmailIndicator to SAS Data

The following table displays the second step in the merging process. Notice there are 42694 observations in this table. The SAS code can be found in the third header of this section, labeled “SAS Code”.

A s s i g n m e n t 6 M e r g i n g i n S A S P a g e | 3

SAS Code

libname mylib '\\uanet.edu\ZIPSpace\C\crk32\Classes\F11 Statistical Data Management\Assignment 6\mylib';

data mylib.EmailTest1;infile 'R:\Fridline\Statistical Data Management\Assignment #6\Email Test1.csv' dsd dlm=','

firstobs=2;input id recency dollars_spent womens zip_code new_customer channel visit conversion

spend;run;

data mylib.EmailTest2;infile 'R:\Fridline\Statistical Data Management\Assignment #6\Email Test2.csv' dsd dlm=','

firstobs=2;input id recency dollars_spent womens zip_code new_customer channel visit conversion

spend;run;

data mylib.mergedObservations;set mylib.emailtest1 mylib.emailtest2;

run;

data mylib.EmailIndicator;infile 'R:\Fridline\Statistical Data Management\Assignment #6\Email Indicator.csv' dsd

dlm=',' firstobs=2;length segment $15;input id segment$;

run;

proc sort data=mylib.mergedObservations;by id;

run;

proc sort data=mylib.EmailIndicator;by id;

run;

data mylib.mergedObservationsAndColumns;merge mylib.MergedObservations mylib.EmailIndicator;by ID;

run;

A s s i g n m e n t 6 M e r g i n g i n S P S S P a g e | 4

Merging EmailTest1 and EmailTest2 into SPSS

The result of the merge is displayed below – notice the total of 42694 entries. The code for this is under the third header of this section labeled “SPSS Code”.

A s s i g n m e n t 6 M e r g i n g i n S P S S P a g e | 5

Add EmailIndicator to SPSS Data

After completing the first step in the merging process, the variable Segment can be added. Notice the addition below. The code is in the next header of this section labeled “SPSS Code”.

A s s i g n m e n t 6 M e r g i n g i n S P S S P a g e | 6

SPSS Code

GET DATA /TYPE=TXT /FILE="R:\Fridline\Statistical Data Management\Assignment #6\Email Test1.csv" /DELCASE=LINE /DELIMITERS="," /ARRANGEMENT=DELIMITED /FIRSTCASE=2 /IMPORTCASE=ALL /VARIABLES= ID F5.0 Recency F2.0 Dollars_Spent F7.2 Mens F1.0 Womens F1.0 Zip_code F1.0 New_Customer F1.0 Channel F1.0 Visit F1.0 Conversion F1.0 Spend F6.2.CACHE.EXECUTE.

DATASET NAME DataSet1 WINDOW=FRONT.GET DATA /TYPE=TXT /FILE="R:\Fridline\Statistical Data Management\Assignment #6\Email Test2.csv" /DELCASE=LINE /DELIMITERS="," /ARRANGEMENT=DELIMITED /FIRSTCASE=2 /IMPORTCASE=ALL /VARIABLES= ID F5.0 Recency F2.0 Dollars_Spent F7.2 Mens F1.0 Womens F1.0 Zip_code F1.0 New_Customer F1.0 Channel F1.0 Visit F1.0 Conversion F1.0 Spend F1.0.CACHE.EXECUTE.

DATASET NAME DataSet2 WINDOW=FRONT.ADD FILES /FILE=* /FILE='DataSet1'.EXECUTE.GET DATA /TYPE=TXT /FILE="R:\Fridline\Statistical Data Management\Assignment #6\Email Indicator.csv" /DELCASE=LINE /DELIMITERS="," /ARRANGEMENT=DELIMITED /FIRSTCASE=2 /IMPORTCASE=ALL /VARIABLES= ID F5.0 Segment A13.CACHE.EXECUTE.

DATASET NAME DataSet1 WINDOW=FRONT.DATASET ACTIVATE DataSet2.SORT CASES BY ID(A).DATASET ACTIVATE DataSet1.SORT CASES BY ID(A).DATASET ACTIVATE DataSet2.MATCH FILES /FILE=* /FILE='DataSet1' /BY ID.

A s s i g n m e n t 6 M e r g i n g i n S P S S P a g e | 7

EXECUTE.

A s s i g n m e n t 6 M e r g i n g i n E x c e l P a g e | 7

Merging EmailTest1 and EmailTest2 into Excel

For the first merging step, I just copied cells A2:K20695 from EmailTest2 and pasted into cell A22002 in EmailTest1. The resulting paste is shown below:

I sorted this file and saved it as comb1.csv.

A s s i g n m e n t 6 M e r g i n g i n E x c e l P a g e | 8

Add EmailIndicator to Excel

Next, I opened and sorted the EmailIndicator dataset.

Recognizing that some ID values in comb1 may not be in EmailIndicator (and some in EmailIndicator may not be in comb1), I used a vlookup statement to fill the new Segment attribute in comb1. In the screen shot below, the vlookup statement is displayed for cell L2, and the Segment attribute is filled.

The last step is to copy/paste as plain text for column L so that the data itself can be saved and not the vlookup formula. This will prevent several problems that could occur in later analysis.

A s s i g n m e n t 6 P e r f o r m a n c e A n a l y s i s P a g e | 9

Men’s Versus Women’s Versions

Although the directions claim that half of the recipients were sent the Men’s ad and half were sent the Women’s ad, this is not a true statement. From the excel sheet below (utilizing the countif function), 80 more people received the Men’s add than received Women’s add.

Therefore my analysis (through a basic pivot table) relied on averages and not sums. (Note that the average value of an indicator variable is equal to the proportion of those having a positive indicator value).

From the pivot table above, recipients of the Men’s advertisement had a higher proportion visited, proportion converted, and dollar spent. Therefore the Men’s catalog was more successful.