SAS Enterprise Miner Release 4.3

63
SAS Enterprise Miner Release 4.3 A brief overview: analysis of the Donor Recapture Case (Case 3) Kevin Garsek … Class of 2006

description

SAS Enterprise Miner Release 4.3. A brief overview: analysis of the Donor Recapture Case (Case 3). Kevin Garsek … Class of 2006. Importing Base Data. SAS’s main drawback is the fact that if any line of data has a null or blank value it will totally disregard the full record - PowerPoint PPT Presentation

Transcript of SAS Enterprise Miner Release 4.3

Page 1: SAS Enterprise Miner Release 4.3

SAS Enterprise MinerRelease 4.3

A brief overview: analysis of the Donor Recapture Case (Case 3)

Kevin Garsek … Class of 2006

Page 2: SAS Enterprise Miner Release 4.3

Importing Base Data

• SAS’s main drawback is the fact that if any line of data has a null or blank value it will totally disregard the full record

• In this case, if we were unable to manipulate the data, the available records would decrease dramatically

• We can fight back by recoding the data as will be shown in the import step

Page 3: SAS Enterprise Miner Release 4.3

Base SAS Interface Screen

Page 4: SAS Enterprise Miner Release 4.3

Importing Charity Data

Text Editor

Page 5: SAS Enterprise Miner Release 4.3

Text Editor

We will use the text editor in Base SAS to import the Charity Case data. In orderto use this editor, you simply type as you would in any text editor.

Page 6: SAS Enterprise Miner Release 4.3

Text Editor

A line by line example of the code that we will use is as follows:

libname charity 'C:\Documents and Settings\Kevin\Desktop\Datamining\charity.1';denotes the master folder where the raw data is housed your local PC

data charity.raw;tells SAS to create a new dataset named charity raw

infile 'chr\2.dat' missover firstobs=2;lets SAS know the individual subfolder in which the data is housed and tells it to import it into the new dataset

input OSOURCE $;names the data column OSOURCE and the $ tells SAS that this is character based data (if this was left out, SASassumes that the data is numerical in format)

OSOURCE_D = 0;due to prevalent missing data, this creates a new dummy variable termed OSOURCE_D and makes the value 0for every record

if trim(OSOURCE) = "“the trim statement deletes any erroneous spaces and the if sets up the opening of an if then statement to compensate for blank data

then do; OSOURCE = "0";this sets all missing values in the OSOURCE column to 0

OSOURCE_D = 1;this sets the newly created dummy variable to 1 when OSOURCE was blank in the input file

end;this ends this statement as all code from infile to end can be written on a single line in the text editor

Page 7: SAS Enterprise Miner Release 4.3

Importing Charity Data

The below depicts the completed code. The actual code can be easily writtenIn Excel using a & statement and then pasted into the text editor. Moving thewriting process to Excel will save considerable time during this laborious process.

Page 8: SAS Enterprise Miner Release 4.3

Importing Charity Data

Once the code is completed, you will need to right hand click in the text editorand select “submit all”. This will tell SAS to read through the code in the texteditor and execute. Be prepared, due to the large size of the data, this will take considerable time to complete.

Page 9: SAS Enterprise Miner Release 4.3

Starting Enterprise Miner from Base SAS moduleYou should now have a fully working dataset and you are now ready to openEnterprise Miner by following the subsequent slides.

Page 10: SAS Enterprise Miner Release 4.3

Starting Enterprise Miner from Base SAS module

Page 11: SAS Enterprise Miner Release 4.3

Starting Enterprise Miner from Base SAS module

Page 12: SAS Enterprise Miner Release 4.3

Binding Data to Program

• This is an exasperating activity

• Even for someone who took a SAS training course in Enterprise Miner

• The documentation is pathetic

• I’ll document each step carefully in case this ever happens to you

Page 13: SAS Enterprise Miner Release 4.3

Name Project Charity and Drag Input Data Node to Workspace

Page 14: SAS Enterprise Miner Release 4.3

Bind Data to Project

Right click on tools to get this menu.

Page 15: SAS Enterprise Miner Release 4.3

Bind Data to Project

Left click on initialization, left click top edit.

Page 16: SAS Enterprise Miner Release 4.3

Bind Data to Project

Right click select; browse for library RDATA; click ok

Page 17: SAS Enterprise Miner Release 4.3

Bind Data to Project

Gotcha: Must select RAW and hit enter even though only data set in RDATA

Page 18: SAS Enterprise Miner Release 4.3

Change to Larger Sample

Left click change; changed to 10,000 to give low response items representation

Page 19: SAS Enterprise Miner Release 4.3

Success!

Page 20: SAS Enterprise Miner Release 4.3

Click Variables Tab

Notice that some variables rejected including some, this is typically due to the fact that that column has only one value throughout e.g. a dummy variable that is 0 due to no variation in the input data.

Page 21: SAS Enterprise Miner Release 4.3

Then Bad Things Happen

• Who knows why.

• If I hadn’t taken the course the slides would stop here.

• That’s the only reason I know what to do

• I’ll document this also, in case it happens to you.

Page 22: SAS Enterprise Miner Release 4.3

Crash Recovery

Right click on top level icon; select explore

Page 23: SAS Enterprise Miner Release 4.3

Crash Recovery

Open emproj; delete all files with extension .lck; open user subfolder; delete everything in user subfolder

Page 24: SAS Enterprise Miner Release 4.3

Analysis Resumes

• We’ll have a look at MAILCODE.

• Enterprise Miner has some neat graphical tools that are easy to use.

• The simplest and easiest are part of the data input tool.

Page 25: SAS Enterprise Miner Release 4.3

A Histogram

Right click item, select “view distribution of MAILCODE” from drop down menu

Page 26: SAS Enterprise Miner Release 4.3

Histogram of Mailcode

SAS has classified as missing data that R accepted and used!

Page 27: SAS Enterprise Miner Release 4.3

Must Identify TARGET_D as Target

Right click row item in column “Model Role”, select “Change Model Role” from drop down menu, select “target” from next drop down menu

Page 28: SAS Enterprise Miner Release 4.3

Histogram of Target

This is what makes the problem hard: extremely low response rate!

Page 29: SAS Enterprise Miner Release 4.3

Save changes!

Page 30: SAS Enterprise Miner Release 4.3

Add Data Partition Node

Drag down from tool bar above and connect line by dragging the mouse.

Page 31: SAS Enterprise Miner Release 4.3

This is What it Does

We will choose to use an 80%/20% training/validation allocation.Close box, right click, click “Run” on drop down menu.

Page 32: SAS Enterprise Miner Release 4.3

Design Philosophy

Click lower tools tab. Note tools on left. One drags a tool to worksheet andconnects with arrows. We’ll now drag and connect regression.

Page 33: SAS Enterprise Miner Release 4.3

Regression

Chose stepwise selection, validation error. That mimics what we did in R.

Page 34: SAS Enterprise Miner Release 4.3

Regression

Right hand click on the Regression node and select run

Page 35: SAS Enterprise Miner Release 4.3

Regression

Regression is highlighted in green while running

Page 36: SAS Enterprise Miner Release 4.3

Regression

Lets take a look at the results; SAS has a very different interpretation of importantvariables that the R analysis

Page 37: SAS Enterprise Miner Release 4.3

Regression

The error rate is not that bad, but the significant variables are not necessarily easilyinterpretable.

Page 38: SAS Enterprise Miner Release 4.3

Regression

Lets try it again with a few changes to the model selection

Page 39: SAS Enterprise Miner Release 4.3

Regression

Again, we get results, but nothing easily interpretable.

Page 40: SAS Enterprise Miner Release 4.3

Regression

Lets limit the regression to those variables determined by R to be significant.To do this, we will again right hand click on regression and select open.

Page 41: SAS Enterprise Miner Release 4.3

Regression

Then go to the variables tab. Right hand click under the status column for eachunneeded variable and set the status to “don’t use”.

Page 42: SAS Enterprise Miner Release 4.3

Regression

In addition to limiting our variables to those from the R results we are going to addan interaction as well as a squared variable. The first step is to add the squared term by adding a transform variables node and right hand clicking on the node and selecting open.

Page 43: SAS Enterprise Miner Release 4.3

Regression

From the variables tab, we will right hand click on DOB and select Transform.

Page 44: SAS Enterprise Miner Release 4.3

Regression

We will now select square. This will create a new variable, DOB_L1S6, which willthen be used in our next regression.

Page 45: SAS Enterprise Miner Release 4.3

Regression

Our next step is to create an interaction. To do this, go back to the main diagram anddouble click on regression. This should bring you into the model manager where youwill click on the Interaction Builder icon.

Page 46: SAS Enterprise Miner Release 4.3

Regression

On this screen, you should use the Ctrl button to highlight both Lastgift and Pepstrfl.Next, press the Cross button in order to create the new interaction variable. The newvariable should be added to the available terms window and should be used insubsequent regressions.

Page 47: SAS Enterprise Miner Release 4.3

Regression

Results! While the initial bar graph may look complex, this is how SAS handlescharacter data and creating dummy variables.

Page 48: SAS Enterprise Miner Release 4.3

Regression

As we now look at the table, or coefficient estimates, we have interpretable results!

Page 49: SAS Enterprise Miner Release 4.3

Regression

For those that are interested, you can look at the Code tab and see the actual SAS coding that one would have to write if you were to program this regression manually.

Page 50: SAS Enterprise Miner Release 4.3

Regression

Lets add another level of analysis and try to rid the data of outliers. To do this, you will need to incorporate a Filter Outlier node between the Transform Variables and Regression nodes.

Page 51: SAS Enterprise Miner Release 4.3

Regression

Double click on the Filter Outliers node and then go to the Settings tab. I have used the above settings, but feel free to experiment for the best outcome. Once you have completed this step, run the regression.

Page 52: SAS Enterprise Miner Release 4.3

Moving On, Try a Tree

Page 53: SAS Enterprise Miner Release 4.3

The tree itself is on the next slide.

Does this look familiar?

This is exactly the same as Fig 22,Learning and Validation MSEof Topic 2, Bias Variance Tradeoff.

Tree

Page 54: SAS Enterprise Miner Release 4.3

SAS does have some great graphics! Below is the tree which istypically presentable to a general audience.

Tree

Page 55: SAS Enterprise Miner Release 4.3

Moving On, Try a Neural Net

Page 56: SAS Enterprise Miner Release 4.3

NetWe will use the defaults for this round of processing. During the run we see the below graphic.

Page 57: SAS Enterprise Miner Release 4.3

NetThe results. Decent output but very difficult to disseminate to a general audience.

Page 58: SAS Enterprise Miner Release 4.3

Assessment Tool

• The assessment tool is supposed to give lift charts.

• Apparently it only does so for binary response.

• The menu item is blank for predictive models.

• The tool is good for easily comparing varying model error rates.

Page 59: SAS Enterprise Miner Release 4.3

Assessment Tool

Page 60: SAS Enterprise Miner Release 4.3

Assessment ToolWhen you double click on the node you will see the following:

Tool Root ASE Root ASE ^2Tree 4.457445 19.86881593Regresion 4.421218 19.5471686Neural Network 4.455325 19.84992086

Page 61: SAS Enterprise Miner Release 4.3

Assessment ToolAs for lift charts, they are unavailable for this analysis …

Page 62: SAS Enterprise Miner Release 4.3

Done!

• The intention was to illustrate the interface, not assess the SAS’s Enterprise Miner per se.

• With more effort to fix the missing values problems on input, better results can surely be achieved.

• With more experience, many of the false steps would not have occurred.

Page 63: SAS Enterprise Miner Release 4.3

Looping and Control

• SAS’s biggest deficiency is the lack of looping and control structures.

• This affects all of SAS, not just Enterprise Miner.

• Any data manipulation, such as fixing missing values, must be done by hand, one variable at a time.

• R has a huge advantage here!