Post on 23-Dec-2015
Agenda
Introduction
SAS EM at Dalhousie University
Exploring SAS EM
Discussion
Introduction
Teaching Assistant with Dalhousie University
Analyst, Precision BioLogic Inc. Consultant
Informatics at Dalhousie
Informatics The study of the application of computer
and statistical techniques to the management of information -HGSC glossary
Dalhousie University First marketing informatics MBA major in
North America The first to use SAS EM for teaching
purposes Health Informatics program New Bachelor of Informatics Success story
Other courses required for Informatics major Multivariate statistics Direct marketing Marketing research Marketing strategy Database design Internet marketing
Our students Work for:
Small consulting companies Large financial institutions Not for profit organizations Telecommunications companies Insurance companies Hospitals Loyalty program companies Travel companies Oil and gas industry Publishing houses A common thing is – they all work with
information
SEMMA Process Sample
Input, partition and sample data Explore
View distributions and associations Modify
Transform data, filter outliers, cluster to derive new variables
Model Develop models i.e. Decision tree’s and
Regression Access
Assess models
Business Problem
Have you ever wanted to understanding things that occur together or in sequence? Market Basket Analysis: Association
Node
Broad applications Basket data analysis, cross-marketing,
catalog design, campaign sales analysis
Web log (click stream) analysis, DNA sequence analysis, etc.
Associations Node
Support, probability that a transaction contains XY Frequency the combination occurs
Confidence, conditional probability that a transaction having X also contains Y Percentage of cases that Y occurs, given
that X has occurred
Sequential Association Y occurs some time period after X occurs
Associations Node If a customer purchases
Avocado, then 80% of the time they will purchase steak Confidence = 800 / 1,000 = 80% Support = 800 / 8,000 = 10%
Avocado Steak8,000 transactions1,000 Avocados2,000 Steak800 Avocados & Steak
antecedent consequent
Business Problem
Have you ever wanted to classify or segment data on the basis of similar attributes so that each segment or cluster differs from another and all objects within a cluster share traits? Segmentation: Clustering Node
Broad Applications Demographic / psychographic
segmentation, campaign segmentation etc.
Clustering Example
Identify similar objects or groups that are dissimilar from other clusters through disjoint cluster analysis on the basis of Euclidean distances
Profile clusters graphically within EM Use derived segments for further
analysis / algorithms (as an input variable or a target)
Customize clusters based on standardization method, clustering method and clustering criterion
Business Problem
Have you ever wanted to predict the likelihood of an event (and assign a cost to it)?
Decision tree Node Broad Applications
classify observations, predict outcomes based on decision alternatives.
Decision Tree Example A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class
distribution Handles missing data well Represent the knowledge in the form of IF-
THEN rules Decision tree generation consists of two
phases Tree construction
At start, all the training examples are at the root Partition examples recursively based on selected
attributes Tree pruning
Identify and remove branches that reflect noise or outliers
Business Problem
Have you ever wanted to ensure you target those most likely to purchase from a campaign whom you’ve never contacted previously?
Scoring Node Broad applications:
Testing model scalability, applying learning for subsequent events, etc.
Lessons learned Data cleansing and transformation takes most of
the time Data analysis done using EM – interpretable
results Data modeling techniques are very robust SAS EM works well with huge datasets Knowledge obtained is transferred easily Learning never stops – EM reference, tutorial
examples You can analyze almost any kind of data You can use SAS EM regardless the industry and
size of dataset You need: a good computer, SAS support, and
patience While not all students use SAS in their careers, the
analytical principles they learn are extremely useful for their careers