1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting...

Chapter 4: Introduction to Predictive Modeling: Regressions

4.1 Introduction

4.2 Selecting Regression Inputs

4.3 Optimizing Regression Complexity

4.4 Interpreting Regression Models

4.5 Transforming Inputs

4.6 Categorical Inputs

4.7 Polynomial Regressions (Self-Study)

4.1 Introduction4.1 Introduction

Model Essentials – Regressions

Predict new cases.

Select useful inputs.

Optimize complexity.

Best modelfrom sequence

Sequentialselection

Predict new cases.

Select useful inputs

Optimize complexity

Linear Regression Prediction Formula

parameterestimate

inputmeasurement

interceptestimate

= w0 + w1 x1 + w2 x2 ^ ^ ^y · · prediction

estimate^

Choose intercept and parameter estimates to minimize:

∑( yi – yi )2

trainingdata

^squared error function

Linear Regression Prediction Formula

parameterestimate

inputmeasurement

interceptestimate

= w0 + w1 x1 + w2 x2 ^ ^ ^y · · prediction

estimate^

Choose intercept and parameter estimates to minimize.

∑( yi – yi )2

trainingdata

^squared error function

Logistic Regression Prediction Formula

= w0 + w1 x1 + w2 x2 ^ ^ ^· · logit scores

1 – p( )^

Logit Link Function

= w0 + w1 x1 + w2 x2 ^ ^ ^· ·

logitlink function

The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞).

1 – p( )^

logit scores

Logit Link Function

= w0 + w1 x1 + w2 x2 ^ ^ ^· · logit scores

logitlink function

The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞).

1 – p( )^

Logit Link Function

= w0 + w1 x1 + w2 x2 ^ ^ ^· ·

1 – p( )^

1 + e-logit( p )p = ^^

^logit( p )

To obtain prediction estimates, the logit equation is solved for p. ^

Logit Link Function

= w0 + w1 x1 + w2 x2 ^ ^ ^· ·

1 – p( )^

1 + e-logit( p )p = ^^

^logit( p )

To obtain prediction estimates, the logit equation is solved for p. ^

Logit Link Function

Simple Prediction Illustration – Regressions Predict dot color for each x1 and x2.

You need intercept and parameter estimates.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

Simple Prediction Illustration – Regressions

You need intercept and parameter estimates.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

log-likelihood function

Find parameter estimates by maximizing

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

log-likelihood function

Find parameter estimates by maximizing

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x1 and x2.

4.01 Multiple Choice PollWhat is the logistic regression prediction for the indicated point?

a. 0.243

b. 0.56

c. yellow

d. It depends.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

4.01 Multiple Choice Poll – Correct AnswerWhat is the logistic regression prediction for the indicated point?

a. 0.243

b. 0.56

c. yellow

d. It depends.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

Regressions: Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

Regressions: Beyond the Prediction Formula

Missing Values and Regression Modeling

Training Datatargetinputs

Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

Consequence: missing values can significantly reduce your amount of training data for regression modeling!

Consequence: Missing values can significantly reduce your amount of training data for regression modeling!

Missing Values and the Prediction Formula

Predict: (x1, x2) = (0.3, ? )

Problem 2: Prediction formulas cannot score cases with missing values.

Predict: (x1, x2) = (0.3, ? )

Missing Value Issues

Missing Value Causes

Non-applicable measurement

No match on merge

Non-disclosed measurement

Missing Value Remedies

xi = f(x1, … ,xp)

Non-applicable measurement

No match on merge

Non-disclosed measurement

Managing Missing Values

This demonstration illustrates how to impute synthetic data values and create missing value indicators.

Running the Regression Node

This demonstration illustrates using the Regression tool.

4.1 Introduction

4.2 Selecting Regression Inputs4.2 Selecting Regression Inputs

Predictionformula

Best modelfrom sequence

Sequentialselection

Predict new cases.

Select useful inputs

Sequential Selection – Forward

Entry CutoffInput p-value

Sequential Selection – Backward

Stay CutoffInput p-value

Sequential Selection – StepwiseInput p-value Entry Cutoff

Stay Cutoff

4.02 PollThe three sequential selection methods for building regression models can never lead to the same model for the same set of data.

4.02 Poll – Correct AnswerThe three sequential selection methods for building regression models can never lead to the same model for the same set of data.

Selecting Inputs

This demonstration illustrates using stepwise selection to choose inputs for the model.

4.1 Introduction

4.3 Optimizing Regression Complexity4.3 Optimizing Regression Complexity

Predict new cases.

Predictionformula

Sequentialselection

Model Fit versus Complexity

1 2 3 4 5 6

Model fit statistic

training

validation

Select Model with Optimal Validation Fit

1 2 3 4 5 6

Model fit statistic

Evaluate eachsequence step.

Optimizing Complexity

This demonstration illustrates tuning a regression model to give optimal performance on the validation data.

4.1 Introduction

4.4 Interpreting Regression Models4.4 Interpreting Regression Models

Beyond the Prediction Formula

Manage missing values

Logistic Regression Prediction Formula

= w0 + w1 x1 + w2 x2 ^ ^ ^· ·

1 – p( )^

logit scores

Odds Ratios and Doubling Amounts

Odds ratio: Amount odds change with unit change in input.Doubling amount:

How much does an input have to change to double the odds?

1 odds exp(wi)

odds 20.69wi

Δxi consequence

= w0 + w1 x1 + w2 x2 ^ ^ ^· ·

1 – p( )^

logit scores

Interpreting a Regression Model

This demonstration illustrates interpreting a regression model using odds ratios.

4.1 Introduction

4.5 Transforming Inputs4.5 Transforming Inputs

Extreme Distributions and Regressions

high leverage pointsskewed inputdistribution

standard regression

true association

standard regression

true association

Original Input Scale

Extreme Distributions and Regressions

standard regression

true association

standard regression

true association

more symmetricdistribution

Regularized Scale

Regularizing Input Transformations

more symmetricdistribution

Regularized Scale

standard regression

Regularizing Input TransformationsRegularized Scale

standard regression

Original Input ScaleOriginal Input Scale

regularized estimate

Regularizing Input TransformationsRegularized Scale

standard regression

regularized estimate

true association

4.03 Multiple Choice PollWhich statement below is true about transformations of input variables in a regression analysis?

a. They are never a good idea.

b. They help model assumptions match the assumptions of maximum likelihood estimation.

c. They are performed to reduce the bias in model predictions.

d. They typically are done on nominal (categorical) inputs.

4.03 Multiple Choice Poll – Correct AnswerWhich statement below is true about transformations of input variables in a regression analysis?

a. They are never a good idea.

b. They help model assumptions match the assumptions of maximum likelihood estimation.

c. They are performed to reduce the bias in model predictions.

d. They typically are done on nominal (categorical) inputs.

Transforming Inputs

This demonstration illustrates using the Transform Variables tool to apply standard transformations to a set of inputs.

4.1 Introduction

4.6 Categorical Inputs4.6 Categorical Inputs

Nonnumeric Input Coding

Level DI

1 0 0 0 0 0 0 0

DA DB DC DD DE DF DG DH

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

00000001

ABCDEFGHI

000000001

Coding Redundancy

1 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

ABCDEFGHI

000000001

Coding Consolidation

1 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

ABCDEFGHI

000000001

Coding Consolidation

1 0 0 0 0 0 0 0

DABCD DB DC DD DEF DF DGH DH

1 0 0 1 0 0 0 0

1 1 0 0 0 0 0 01 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 1 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 1 10 0 0 0 0 0 0 0

ABCDEFGHI

Recoding Categorical Inputs

This demonstration illustrates using the Replacement tool to facilitate the process of combining input levels.

4.1 Introduction

4.7 Polynomial Regressions (Self-Study)4.7 Polynomial Regressions (Self-Study)

Standard Logistic Regression

= w0 + w1 x1 + w2 x2 ^

^ ^ ^log p

1 – p( )^ ·

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

Polynomial Logistic Regression

= w0 + w1 x1 + w2 x2 ^

^ ^ ^log p

1 – p( )^ · ·

quadratic terms

+ w3 x1 + w4 x2 2 2^ ^

+ w5 x1 x2

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.40 0.50 0.60 0.700.30

Adding Polynomial Regression Terms Selectively

This demonstration illustrates how to add polynomial regression terms selectively.

Adding Polynomial Regression Terms Autonomously (Self-Study)

This demonstration illustrates how to add polynomial regression terms autonomously.

Exercises

This exercise reinforces the concepts discussed previously.

Regression Tools ReviewReplace missing values for interval (means) and categorical data (mode). Create a unique replacement indicator.

Create linear and logistic regression models. Select inputs with a sequential selection method and appropriate fit statistic. Interpret models with odds ratios.

Regularize distributions of inputs. Typical transformations control for input skewness via a log transformation.

continued...

Regression Tools Review

Consolidate levels of a nonnumeric input using the Replacement Editor window.

Add polynomial terms to a regression either by hand or by an autonomous exhaustive search.

1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting...

Documents

Transcript of 1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting...

Correlation and Regression: Example - The Personality Projectpersonality-project.org/revelle/syllabi/405/regression.pdf · Preliminaries Simple regressions Multiple R with interaction

INTRODUCTION TO CLINICAL RESEARCH Introduction Regression · Introduction to Linear Regression Gayane Yenokyan, MD, MPH, PhD Associate Director, Biostatistics Center ... What Is Regression

Introductory Econometrics - homepage.univie.ac.at · Introduction Simplelinearregression Multiplelinearregression Heteroskedasticity Regressions withtime-seriesobservations Asymptotics

Automated Detection of Performance Regressions Using … · 2020. 2. 5. · Automated Detection of Performance Regressions Using Regression Models on Clustered Performance Counters

Data mining and statistical learning, lecture 5 Outline Summary of regressions on correlated inputs Ridge regression PCR (principal components regression)

Multiple Regressions and Correlation Analysisocw.upj.ac.id/files/Slide-MGT205-Slide14.pdf · 14-3 Multiple Regression Analysis The general multiple regression with k independent variables

© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.

Linear Regression. Linear Regressions So far, we have been graphing ___________ functions. So far, we have been graphing ___________ functions. In the.

Introduction to Multiple Regression - Statpower Slides... · Introduction to Multiple Regression 1 The Multiple Regression Model 2 Some Key Regression Terminology 3 The Kids Data

Introduction to Regression and Correlation - Statpower Slides/RegressionIntro.pdfTwo Bivariate Regression Models Where from Here? ... Introduction to Regression Analysis Regression

ANOVA and Regression Brian Healy, PhD. Objectives ANOVA ANOVA –Multiple comparisons Introduction to regression Introduction to regression –Relationship.

Modèles de regression M2-MASS - François Kauffmann...Regressions Mod eles de regression M2-MASS Francois.Kau mann@unicaen.fr Universit e de Caen Basse-Normandie 12 octobre 2013 Francois.Kau

Marcel Dettling - stat.ethz.ch · AS 2015 – Multiple Regression Why Simple Regression Is Not Enough Performing many simple lineare regressions of the response on any of the predictors

First-order lin. regressions: DRP = a 0 + b 0 . X Multiple linear regression: no improvement

Having Fun with Regressions Using SAS · Having Fun with Regressions Using SAS March 16, 2009. What are the Different Proc for Regression in SAS? • CALIS • CATMOD • GENMOD •

· Akaike information criterion (AIC) for the regression is minimized. xtunitroot llc will ﬁt ADF regressions with 1 to # lags and choose the regression for which the AIC is minimized.

Econometrics notes (Introduction, Simple Linear regression, Multiple linear regression)

OMNI Hypnosis Training Center DeLand, Florida - USA · • Past Life Regression (Regressions - Brian Weiss) How to regress to past lives. 5 types of regression, the theories behind

Introduction to regression · Introduction toregression PaulSchrimpf Motivation Conditional expectation function Population regression Interpretation Sample regression Regressionin

Introduction to regression

Linear Regression. Linear Regressions So far, we have been graphing _ functions. So far, we have been graphing _ functions. In the.