Exam 3 Review Decision Trees Cluster Analysis Association Rules Data Visualization SAS.

15
Exam 3 Review Decision Trees Cluster Analysis Association Rules Data Visualization SAS

Transcript of Exam 3 Review Decision Trees Cluster Analysis Association Rules Data Visualization SAS.

Exam 3 ReviewDecision TreesCluster Analysis

Association RulesData Visualization

SAS

SAS

• When to Use Which Analysis (D, C or A)?– When someone gets an A in this class, what

other classes do they get an A in?– What predicts whether a company will go

bankrupt?– If someone upgrades to an iPhone, do they

also buy a new case?– Which party will win the election?– Can we group our website visitors into types

based on their online behaviors?– Which customers will purchase our product?– Can we identify different product markets

based on customer demographics?

Decision Trees

• Which is the Root Node?• # Leafs Nodes?

• Probability of Purchase?i) Female, 130 lbs, 12 ft? ii) 120 lbs, 5 feet, male?

• Best predictor variable?

Outcome Data

0 62%1 38%n 350

Outcome Data

0 55%1 45%n 250

Outcome Data

0 40%1 60%n 150

Outcome Data

0 60%1 40%n 250

Outcome Data

0 45%1 55%n 75

Outcome Data

0 35%1 65%n 75

Height

Weight<150 >=150

Weight

Gender

<170 >=170

Male Female

<6’ >=6’

• Probability of Purchase?i) 5 ft 5 inches?

ii) 6 ft 5 inches 190 lbs?

Outcome Data

0 62%1 38%n 350

Outcome Data

0 55%1 45%n 250

Outcome Data

0 40%1 60%n 150

Outcome Data

0 60%1 40%n 250

Outcome Data

0 45%1 55%n 75

Outcome Data

0 35%1 65%n 75

Height

Weight<150 >=150

Weight

Gender

<170 >=170

Male Female

<6’ >=6’

Decision Trees

• What does it mean that Gender is only on the right side of the tree? Why is it not on both sides?

• Based on the tree, which demographic is MOST likely to buy the product? Least likely to buy the product?

Decision Trees• What Statistics are Used to Determine Splits for

Decision Trees?– Gini Coefficient, Chi-Square Statistics (p-value)

• What does it mean when the Gini = 1?

• What does it mean when the Chi-square is bigger?

• What happens to the p-value as the Chi-square gets bigger?

Clustering

• What statistics do we care about in cluster analysis? What do they represent?

• What happens to these statistics as the number of clusters is increased?

• Why do we standardize data? Why do we eliminate outliers?

Clustering

• What are the pros and cons of having only a few clusters (compared to having many clusters)?

• What is bad about the below cluster analysis result? How would you improve it?

Association Rules

• How would you describe the following association rule?– {Meat, Dairy} {Vegetables}

• How many items are in this item set?

• What is (are) the antecedents? What are the consequents?

• What are the statistics we care about when evaluating an association rule?

Association Rules

• Do the following two rules have to have the same Confidence? The same Support? The same Lift?– {Meat, Dairy} {Vegetables}– {Vegetables} {Meat, Dairy}

• What does Lift > 1 mean? Would you take action on such a rule?–What about Lift < 1?–What about Lift = 1?

Association Rules

• What might you do as a manager if you saw a very high Lift and Confidence for the following rule about product purchase? Why would you do this?– {Pasta} {Orange Juice}

Association Rules

• What is the most reliable association rule below?

Data Visualization

• Look at In-Class Exercise Answers...