Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@...

Zhangxi LinISQS 7342-001Texas Tech UniversityNote: Most slides in this file are sourced from SAS@ Course Notes

Lecture Notes 8Continuous and Multiple Target Prediction

Structure of the Chapter

Section 2.1 raises the problem that the normal decision tree methods did not turn out good results

Section 2.2 analyzes the problem

Section 2.3 develops basic two-stage models to improve the results

Section 2.4 further improves the two-stage models

Section 2.1

Introduction

Motivation

The results of the 1998 KDD-Cup produced a surprise. Almost half of the entrees yielded a total profit on the validation data that was less than that obtained by soliciting everyone.

Part of the problem lies in the method used to select cases for solicitation. This chapter extends the notion of profit introduced in Chapter 1 to allow for better selection of cases for solicitation.

1998 KDD-Cup Results

1.2.3.4.5.6.7.8.9.

$14,71214,66213,95413,82513,79413,59813,04012,29811,42311,276

TotalProfitRank

$0.1530.1520.1450.1430.1430.1410.1350.1280.1190.117

OverallAvg. Profit

11.12.13.14.15.16.17.18.19.20.

$ 10,72010,70610,11210,0499,7419,4645,6835,4841,9251,706

TotalProfitRank

$ 0.1110.1110.1050.1040.1010.0980.0590.0570.0200.018

OverallAvg. Profit

$10,560$ 0.110

Total profitAvg. profitfor “solicit everyone”

Section 2.2

Generalized Profit Matrices

Random Profit Consequences

Profit Profit00 Profit0Profit0

Primary Decision Secondary Decision

Negative profit

Outcome Conditioned Random Profits

In a more general context, the profit associated with a decision for an individual case can be thought of as a random variable. The goal of predictive modeling is to estimate the distribution of this profit random variable conditioned on case input measurements.

Because the decisions are usually associated with discrete outcomes, the random profits are conditioned on each of these outcomes. For a binary outcome and two decisions, the random profits form the elements of a 2x2 random matrix.

Outcome Conditioned Random Profits

Profit Profit00

Profit0Profit0

PrimaryOutcome

SecondaryOutcome

Negative profit

Expected Profit Matrix

Profit Profit00

Profit0Profit0 E( ) E( )

E( )E( )

PrimaryOutcome

SecondaryOutcome

Negative profit

Expected/Reduced Profit Matrix

Because it is easier to work with concrete numbers than random variables, statistical summaries of the random profit matrices are used to quantify the consequence of a decision.

One way to do this is to calculate the expected value of the profit random variable for each outcome and decision combination. Arrayed as a matrix, this is called the expected profit-consequence matrix, or the expected profit matrix for a case.

Often, generalized profit matrices have zeros in the secondary decision column. Without loss of generality (assuming the profit-consequence is measured by expected value), it is always possible to write the generalized profit matrix with a column of zero profits

Reduced Profit Matrix

Profit Profit00

Profit0Profit0 E( ) E( )

E( )E( )

PrimaryOutcome

SecondaryOutcome

Negative profit

The difference

Reduced Profit Matrix

Profit0

Profit0 E( )

Primary Decision

PrimaryOutcome

SecondaryOutcome

Profit0

Profit0 E( )

Secondary Decision

Negative profit

The difference

Expected Profit-Consequence

Primary Decision

PrimaryOutcome

SecondaryOutcome

ExpectedProfit-Consequence

+∙ ∙EPF p EPF p∙ + ∙EPC =

Negative profit

Primary Decision

PrimaryOutcome

SecondaryOutcome

EPC EPF p EPF p∙ + ∙

EPF p EPF p∙ + ∙=

Negative profit

Primary Decision

PrimaryOutcome

SecondaryOutcome

EPC EPF p EPF p∙ + ∙

EPF p EPF p∙ + ∙=

Negative profit

Primary Decision

PrimaryOutcome

SecondaryOutcome

Negative profit

Sort Expected Profit-Consequence

Sort cases by decreasing EPC.

EPC ≥

EPC≥

EPC ≥

EPC≥

Total Expected Profit

EPC ≥

EPC≥

EPC ≥

EPC≥

Sum EPCs inexcess of threshold.

EPC ≥

EPC≥

EPC ≥

EPC≥

Total Expected Profit

EPC ≥

EPC≥

EPC ≥

EPC≥

Sum EPCs inexcess of threshold.

Observed Profit

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

EPC ≥

EPC≥

EPC ≥

EPC≥

Observed Profit

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

OP EPC ≥

EPC ≥

EPC≥

EPC ≥

EPC≥

Observed Profit

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

EPC ≥

EPC≥

EPC ≥

EPC≥

Negative profit

Observed Profit

EPCProfit0

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

EPC ≥

EPC≥

EPC ≥

EPC≥

Observed Profit

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

EPC ≥

EPC≥

EPC ≥

EPC≥

Negative profit

Observed Profit

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

EPC ≥

EPC≥

EPC ≥

EPC≥

Negative profit

Observed Profit

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

EPC ≥

EPC≥

EPC ≥

EPC≥

Observed Profit

Profit0

Primary Decision

PrimaryOutcome

ObservedProfit

SecondaryOutcome

ObservedProfit

EPC ≥

EPC≥

EPC ≥

EPC≥

Negative profit

EPC ≥

EPC≥

EPC ≥

EPC≥

Observed Profit

Record observedprofits.

OP ≥

Observed Total Profit

Sum OPs for cases with EPCs in excess

of threshold.

OP ≥

EPC ≥

EPC≥

EPC ≥

EPC≥

OP ≥

Generalized Profit Assessment Data

Sum OPs for cases with EPCs in excess

of threshold.

OP ≥

EPC ≥

EPC≥

EPC ≥

EPC≥

EPC ≥

EPC≥

EPC ≥

EPC≥

EPCEPC OP

OP ≥

Total Profit Plot

OP ≥

EPC ≥

EPC≥

EPC ≥

EPC≥

Observed and Expected Profit Plot

EPC ≥

EPC≥

EPC ≥

EPC≥

EPCEPC OP

OP ≥

EPC ≥

EPC≥

EPC ≥

EPC≥

Profit Confusion Matrix

Primary Decision

PrimaryOutcome

SecondaryOutcome

true positive profit

false positive profit

total primary profit

total secondary profit

Secondary Decision

false negative profit

true negative profit

total primary decision profit

OP total secondarydecision profit

True Positive Profit Fraction

Primary Decision

PrimaryOutcome

SecondaryOutcome

true positive profit

false positive profit

total primary profit

total secondary profit

Secondary Decision

False Positive Profit Fraction

Primary Decision

PrimaryOutcome

OP true positive profit total primary

profit

Secondary Decision

SecondaryOutcome

OP false positive profit total secondary

profit

Section 2.3

Basic Two-Stage Models

Defining Two-Stage Model Components

E(B|X)E(D|X)

15.30X Specified values

Separate predictive models

Joint predictive modelsE(B,D|X)

Two-Stage Modeling Methods

A better estimate of the primary decision profit can be obtained by modeling both outcome probability and expected profit, using two-stage modeling methods.

The two ways to estimate the components used in two-stage models. The first is to simply specify values for certain components. This is

simple to do, but it often produces poor results. In a more sophisticated approach, you can use the value in an input

or a look-up table as a surrogate for expected donation amount. The most common approach is to estimate values for components with

individual models. At the extreme end of the sophistication scale, you can use a single model

to predict both components simultaneously, for example, the NEURAL procedure in SAS Enterprise Miner.

Basic Two-Stage Models

Two-stage model collapses two models: - One to estimate the donation propensity;- Another one to estimate the donation amount.

Two-Stage Model Tool

The Two Stage Model tool builds two models, one to predict TARGET_B and one to predict TARGET_D. Theoretically, you can use this node to combine predictions for the two target variables and get a prediction of expected donation amount.

The tool has two minor limitations: It does not recognize the prior vector. Thus, because

responders are overrepresented in the training data, the probabilities in the TARGET_B model are biased.

The node has no built-in diagnostic to assess overall average profit. Profit information passed to the Assessment node is incorrect.

Both of these limitations are easily overcome by the Generalized Profit Assessment tool.

The Model We Are Using

Basic model

Different from the book

Target Variables

Some Two-Stage Model Options

Model fitting approach: sequential, or concurrent Sequential: couples model by making the binary outcome

model’s prediction an input for the expected profit model Concurrent: fits a neural network model with two target

FILTER: removes cases from the training data when building the value model

MULTIPLY: multiplies the class and value model predictions

Results of the Two-Stage Node

Results of the GPA Node

Oddities in the assessment report.

1. The reported overall average profits from training data are extremely low.

2. The depth supposedly corresponding to optimum profit threshold is reported to be 100% (select all cases).

3. The total profit reported in the validation data is almost 40% higher than in the training data.

Stratification with BIN_TARGET_D

Improved Results of the GPA Node

The third problem has been solved.

But the performance of the model is still lower than that from “no model”.

Correct bias in GPA by setting the following parameter in the code:

%let EM_PROPERTY_adjustprobs = Y;

The model is no longer selecting all the data (it is now around 60%), but the overall average profit values remain low.

The average profit = 0.1105. It is slightly more than that without using a model.

Results from an Improved Two-Stage Model

Parameters:Class Model: Regression

Selection Model: Stepwise

Selection Criteria: Validation Error

The Average Profit: 0.155

This result is good enough to win the KDD Cup!

Summary – Improving the Performance

Section 2.3 Use two-stage models Stratification using the binned value target Correct bias in GPA: %let EM_PROPERTY_adjustprobs = Y;

Section 2.4 Use regression settings in the Two Stage node Reduce MSE: Interval target value transformation Construct the component models separately from the Two Stage node.

Use regression trees in a two-stage model(%let EM_PROPERTY_adjustprobs = N;)

Use neural networks in a two-stage model

(%let EM_PROPERTY_adjustprobs = N;)

Section 2.4

Constructing Component Models

Two-Stage Modeling Challenges

Model Assessment

Interval Model SpecificationE(D) = g(x;w)

Constructing two-stage (or more generally, any multiple component model) requires attention to several challenges not previously encountered.

Earlier modeling assessment efforts evaluated models based on profitability measures, assuming a fixed profit structure. Because the profit structure itself is being modeled in a two-stage model, you need a different mechanism to assess model performance.

Correct specification requires appropriately chosen inputs, link functions, and target error distribution.

By incorporating the predictions of the binary model into the interval mode, it can be possible to make a more parsimonious specification of the interval model.

Estimating Mean Squared Error

Training Data

(Di - Di )2^

N1NEstimated MSE =

E[(D-D)2]^

MSE Decomposition: Variance

Training Data

Variance

(Di - Di )2^

N1NEstimated MSE =

E[(D-D)2] = E[(D-ED)2] + [E(D-ED)]2^^

In theory, the MSE can be decomposed into two components, each involving adeviance from the true expected value of the target variable.

MSE Decomposition: Squared Bias

Training Data

(Di - Di )2^

N1NEstimated MSE =

VarianceMSE

E[(D-D)2] ^ = E[(D-ED)2] + [E(D-ED)]2^

Variance - independent of any fitted model.Bias2 - the difference between the predicted and actual expected value

Honest MSE Estimation

Validation Data

(Di - Di )2^

N1NEstimated MSE =

VarianceMSE

E[(D-D)2] ^ = E[(D-ED)2] + [E(D-ED)]2

Unbiased estimates can be obtained by correctly accounting for model degrees of freedom in the MSE estimate or simplyestimating MSE from an independent validation data set.

Validation Data

(Di - Di )2^

N1NEstimated MSE =

VarianceMSE

E[(D-D)2] ^ = E[(D-ED)2] + [E(D-ED)]2

Validation Data

(Di - Di )2^

N1NEstimated MSE =

VarianceMSE

E[(D-D)2] ^ = E[(D-ED)2] + [E(D-ED)]2

InseparabilityB̂

MSE and Binary Target Models

Validation Data

(Bi - Bi )2^

N1NEstimated MSE =

Inaccuracy

E[(B-B)2] ^ = E[(B-EB)2] + [E(B-EB)]2

Imprecision

VarianceMSE Bias2

The Binary Target

The estimated MSE of the binary target can be thought of as measuring the overall inaccuracy of model prediction.

This inaccuracy estimate can be decomposed into a term related to the inseparability of the two-target levels (corresponding to the variance component) plus a term related to the imprecision of the model estimate (corresponding to the bias-squared component).

In this way, the model with the smallest estimated MSE will also be the least imprecise.

Model Assessment

Use Validation MSE

To assess both the binary and the interval component models, it is reasonable to compare their validation data mean squared error. Models with the smallest MSE will have the smallest bias or imprecision.

Model Assessment

Use Validation MSE

A standard regression model may be ill suited for accurately modeling the relationship between the inputs and TARGET_D.

Matching the structure of the model to the specific modeling requirements is vital to obtaining good predictions.

Interval Model Requirements

Correct Error Distribution

Good Inputsx1 x3 x10

E(D) > 0 Positive Predictions

Adequate Flexibility

Making Positive Predictions

log(E(Y |X ))

E( log(Y) | X ) Transform target.

Define appropriate link.

Hints:

The interval component of a two-stage model is often used to predict a monetaryresponse. Random variables that represent monetary amounts usually assume askewed distribution with positive range and a variance related to expected value.When the target variable represents a monetary amount, this limited range and skewness in the model specification must be considered.

Proper specification of the target range and error distribution increases the chances of selecting good inputs for the interval target model. With good inputs, the correct degree of flexibility can be incorporated into the model and predictions can be optimized.

Error Distribution Requirements

Possess correct skewness.

Have conforming support.

Account for heteroscedasticity.

Specifying the Correct Error Distribution

Normal (truncated)constant*

Poisson E(Y)

Gamma (E(Y))2

Lognormal (E(Y))2

Distribution Variance

The normal distribution has a range from negative to positive infinity,whereas the target variable may have a more restricted range.

Poisson E(Y)

Gamma (E(Y))2

Lognormal (E(Y))2

One disadvantage of the Poisson distribution relates to its skewness properties.Poisson error distributions are limited to the Neural Network node.

Poisson E(Y)

Gamma (E(Y))2

Lognormal (E(Y))2

The gamma distribution is limited to the neural network node. The lognormaldistribution can be used with any modeling tool.

Poisson E(Y)

Gamma (E(Y))2

Lognormal (E(Y))2

A few extreme outliers may indicate a lognormal distribution, whereas the absence of such may imply a gamma or less extreme distribution.

Model Assessment

log(Target) / Specify Link and Error

Use Validation MSE

Interval Target Model

The Parameters and Results

Compare the Distributions of Residuals

Use Log-transformed Target_D Using original Target_D

Using Regression Trees

Using Neural Network Models

Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@...

Documents

Transcript of Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@...

7342 Parts Identification Manual

1 ISQS 3358, Business Intelligence Data Warehousing Zhangxi Lin Texas Tech University 1.

Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review.

Data Mining Application in U.S Crop Insurance Program Alexis Garcia ISQS 7342.

ISQS 6339, Business Intelligence Dimensional Modeling

7342 Isam Fttu Bro

ISQS 6339, Business Intelligence Data Warehousing Zhangxi Lin Texas Tech University 1 1.

ISQS 6347, Data

Zhangxi Lin ISQS 3358 Texas Tech University 1. Define data mining and list its objectives and benefits Understand different purposes and applications.

Zhangxi Lin Texas Tech University ISQS 6339, Data Mgmt & BI 1 ISQS 6339, Data Management & Business Intelligence Introduction.

Review for Final in ISQS 4350

ISQS Website Management System

Welcome to Project Management ISQS 7342

Alcatel 7342 Intelligent Service Access Manager (ISAM ... 7342 ISAM FTTU.pdf · ALCATEL > 5 7342 P-OLT WDM 1:64 7450 BSA Splitters Internet IPTV AMS GigE 10 GigE 20 km GR303 7342

1 Business System Analysis & Decision Making – Data Mining and Web Mining Zhangxi Lin ISQS 5340 Summer II 2006.

ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

J. R. Burns, Texas Tech University Welcome to Project Management ISQS 7342 n Information Systems Project Management, that is…. n INSTRUCTOR: Dr. Burns.

Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.

ISQS 6339, Business Intelligence Creating Data Marts Zhangxi Lin Texas Tech University 1.

1 ISQS 3358 Business Intelligence Probability and Statistics (review) Zhangxi Lin ISQS 3358 Texas Tech University.