Neural Networks and Logistic Regression Lucila Ohno-Machado Decision Systems Group Brigham and...

Neural Networks and Logistic Regression

Lucila Ohno-Machado

Decision Systems Group

Brigham and Women’s Hospital

Department of Radiology

CoronaryDisease

NeuralNet

Outline

• Examples, neuroscience analogy

• Perceptrons, MLPs: How they work

• How the networks learn from examples

• Backpropagation algorithm

• Learning parameters

• Overfitting

Examples in MedicalPattern Recognition

Diagnosis

• Protein Structure Prediction

• Diagnosis of Giant Cell Arteritis

• Diagnosis of Myocardial Infarction

• Interpretation of ECGs

• Interpretation of PET scans, Chest X-rays

Prognosis

• Prognosis of Breast Cancer

• Outcomes After Spinal Cord Injury

Myocardial Infarction Network

0.8Myocardial Infarction “Probability” of MI

112 150

MaleAgeSmokerECG: STPainIntensity

PainDuration Elevation

Abdominal Pain Perceptron

Male Age Temp WBC PainIntensity

PainDuration

37 10 11 20 1adjustableweights

0 1 0 0000

AppendicitisDiverticulitis

PerforatedNon-specific

CholecystitisSmall Bowel

PancreatitisObstructionPainDuodenal Ulcer

Biological Analogy

Synapses

Dendrites

Synapses++

(weights)

Input layer

Output layer

Input patterns000011

110010

Sortedpatterns

Perceptrons

weights

Output units

No disease Pneumonia Flu Meningitis

Input units

Cough Headache

what we gotwhat we wanted-error

rulechange weights todecrease the error

Perceptrons

Input units

Input to unit j:aj =wij ai

Input to unit i:ai

measured value of variable i

Output of unit j:

oj = 1/ (1 + e- (aj+j) )Output

input output00011011

f(x1w1 + x2w2) = y

f(0w1 + 0w2) = 0 f(0w1 + 1w2) = 0 f(1w1 + 0w2 ) = 0 f(1w1 + 1w2 ) = 1

f(a) = 1, for a > 0, for a

some possible values for w1 and w2

0.200.200.250.40

0.350.400.300.20

input output00011011

f(x1w1 + x2w2) = y

f(0w1 + 0w2) = 0 f(0w1 + 1w2) = 1 f(1w1 + 0w2) = 1 f(1w1 + 1w2) = 0

f(a) = 1, for a > 0, for a

some possible values for w1 and w2

XORinput output

00011011

x 1 x 2

f(a) = 1, for a > 0, for a

z = 0.5w 3 w 4

f(w1, w2, w3, w4, w5)

a possible set of values for ws

(w1, w2, w3, w4, w5)

(0.3,0.3,1,1,-2)

XORinput output

00011011

f(a) = 1, for a > 0, for a

f(w1, w2, w3, w4, w5 , w6)

a possible set of values for ws

(w1, w2, w3, w4, w5 , w6)

(0.6,-0.6,-0.7,0.8,1,1)

w1 w4w3w2

= 0.5 for all units

Linear Separation

Cough No cough

CoughNo coughNo headache No headache

Headache Headache

No disease

Meningitis Flu

Pneumonia

No treatmentTreatment

000 100

111011

Y = a(X) + b Y =1 + e-a(X) + b

Linear LogisticRegressionDiscriminant

Abdominal Pain

37 10 1

Appendicitis Diverticulitis

PerforatedNon-specific

CholecystitisSmall Bowel

Pancreatitis

Male Age Temp WBC PainIntensity

PainDuration

0 1 0 0000

adjustableweights

ObstructionPainDuodenal Ulcer

Multilayered Perceptrons

Input units

Input to unit j:aj =wijai

Input to unit i:aimeasured value of variable i

Output of unit j:

oj = 1/ (1 + e- (a j+j) )

Output units

Perceptron

MultilayeredInput to unit k:

perceptron

Output of unit k:ok = 1/ (1 + e- (ak+k) )

Hiddenunits

ak =wjkoj

Regression vs. Neural Networks

X1 X2 X3

“X1” “X1X3” “X1X2X3”

“X2”

X1 X2 X3 X1X2 X1X3 X2X3

(23-1) possible combinations

X1X2X3

Y = a(X1) + b(X2) + c(X3) + d(X1X2) + ...

Logistic Regression

• One independent variable

f(x) = 1

1 + e -(ax + cte)

• Two

f(x) = 1

1 + e -(ax1 + bx2 + cte)

Logistic function

1 + e -(ax + cte)

log (p/1-p) = ax + cte

log(p/1-p)

linear

Logistic function

1 + e -(ax + cte)

log (p/1-p) = ax + cte

linear

a is the odds for1 unit of increase in x

Jargon Pseudo-Correspondence

• Independent variable = input variable

• Dependent variable = output variable

• Coefficients = “weights”

• Estimates = “targets”

• Cycles = epoch

Logistic Regression ModelInputs

Coefficients

a, b, c

Output

Independent variables

x1, x2, x3

Dependent variable

Prediction

Age 34

1Gender

Stage 4

“Probability of beingAlive”

is the sum of inputs * weightsInputs

Coefficients

Output

Prediction

Age 34

1Gender

Stage 4

Logistic functionInputs

Coefficients

Output

Prediction

Age 34

1Gender

Stage 4

p = 1 1 + e -( + cte)

Activation Functions...

• Linear

• Threshold or step function

• Logistic, sigmoid, “squash”

• Hyperbolic tangent

Neural Network ModelInputs

Weights

Output

Dependent variable

Prediction

Age 34

2Gender

Stage 4

WeightsHiddenLayer

“Combined logistic models”Inputs

Weights

Output

Dependent variable

Prediction

Age 34

2Gender

Stage 4

WeightsHiddenLayer

Inputs

Weights

Output

Dependent variable

Prediction

Age 34

2Gender

Stage 4

WeightsHiddenLayer

Inputs

Weights

Output

Dependent variable

Prediction

Age 34

1Gender

Stage 4

WeightsHiddenLayer

Not really, no target for hidden units...

WeightsIndependent variables

Dependent variable

Prediction

Age 34

2Gender

Stage 4

WeightsHiddenLayer

Perceptrons

weights

Output units

No disease Pneumonia Flu Meningitis

Input units

Cough Headache

rulechange weights todecrease the error

Hidden Units and Backpropagation

Input units

Output units

Hiddenunits

bac kpropag ati on

Error Functions

• Mean Squared Error (for most problems)

(t - o)2/n

• Cross Entropy Error (for dichotomous or binary outcomes)

(t ln o) + (1-t) ln (1-o)

Minimizing the Error

winitialwtrained

initial error

final error

Error surface

positive change

negative derivative

local minimum

Numerical Methods

a(x3) + b(x2) + c(x) + d = 0

1st pair of guessed roots

2nd pair of guessed roots

Gradient descent

Local minimum

Global minimum

Overfitting

Overfitted ModelReal Distribution

Overfitting

b = training set

a = test set

Overfitted model

Epochs

mintss )

Stopping criterion

Overfitting in Neural NetsC

Overfitted model “Real” model

cycles

Overfitted model

holdout

training

Parameter Estimation

Logistic regression• It models “just” one

function– Maximum likelihood

– Fast

– Optimizations• Fisher

• Newton-Raphson

Neural network• It models several

functions– Backpropagation

– Iterative

– Slow

– Optimizations• Quickprop• Scaled conjugate g.d.• Adaptive learning rate

What do you want?Insight versus prediction

Insight into the model• Explain importance of

each variable• Assess model fit to

existing data

Accurate predictions• Make a good estimate

of the “real” probability

• Assess model prediction in new data

Model SelectionFinding influential variables

Logistic• Forward• Backward• Stepwise• Arbitrary• All combinations• Relative risk

Neural Network• Weight elimination• Automatic Relevance

Determination• “Relevance”

Regression DiagnosticsFinding influential observations

Logistic• Analysis of residuals• Cook’s distance• Deviance• Difference in

coefficients when case is left out

Neural Network• Ad-hoc

How accurate are predictions?

• Construct training and test sets or bootstrap to assess “unbiased” error

• Assess – Discrimination

• How model “separates” alive and dead

– Calibration• How close the estimates are from “real” probability

“Unbiased” EvaluationTraining and Tests Sets

• Training set is used to build the model (may include holdout set to control for overfitting)

• Test set left aside for evaluation purposes

• Ideal: yet another validation data set, from different source to test if model generalizes to other settings

Small sets: Cross-validation

• Several training and test set pairs are created so that the union of all test sets corresponds exactly to the original set

• Results from the different models are pooled and overall performance is estimated

• “Leave-n-out”

• Jackknife

ECG Interpretation

R-R interval

S-T elevation

P-R interval

QRS duration

AVF lead

QRS amplitude

SV tachycardia

Ventricular tachycardia

LV hypertrophy

RV hypertrophy

Myocardial infarction

Thyroid Diseases

Hiddenlayer

Patientdata

Partialdiagnoses

Clinical¼nding1

(5 or 10 units)

Normal

Hyperthyroidism

Hypothyroidism

Otherconditions

Patients whowill be evaluatedfurther

Hiddenlayer

Patientdata

Finaldiagnoses

Clinical¼nding

(5 or 10 units)

Normal

Primaryhypothyroidism

CompensatedhypothyroidismSecondaryhypothyroidism

Hypothyroidism

OtherconditionsAdditional

Time Series

Hidden units

Xn X n+1

Input units

Y = Xn+2

Output units(dependent variables)

(independent variables)

Weights(estimated parameters)

Time Series

Hidden units

Xn Xnn+1 X n+1

Input units

Y = Xn+2Xn+1 n+2

Output units(dependent variables)

(independent variables)

Weights(estimated parameters)

Evaluation

Training Test

Validation

Randomizationof cases

Modeldevelopment

Modelenhancement

Model evaluation

“A” “B”

Type I

Type II

Evaluation: Area Under ROCs

1 - Speci¼city

Data Models

Neural network1 - Speci¼city

1 - Speci¼city

Area under ROCComparison

ROC Analysis: Variations

Area under ROC

Slope andIntercept

Confidence interval

Wilcoxon statistic

Expert Systems and Neural Nets

# Examples

ExpertSystems

NeuralNetworks

Model Comparison(personal biases)

Modeling ExamplesExplanation

Effort Needed Provided

Rule-based Exp. Syst. high low high

Bayesian Nets high low moderate

Classification Trees low high “high”

Neural Nets low high low

Regression Models high moderate moderate

Conclusion

Neural Networks are

• mathematical models that resemble nonlinear regression models, but are also useful to model nonlinearly separable spaces

• “knowledge acquisition tools” that learn from examples

• Neural Networks in Medicine are used for:– pattern recognition (images, diseases, etc.)

– exploratory analysis, control

– predictive models

Conclusion

• No final indication for using either logistic regression or neural network

• Try both, select best

• Make unbiased evaluation

• Compare statistically

Some References

Introductory Textbooks• Rumelhart, D.E., and McClelland, J.L. (eds) Parallel Distributed

Processing. MIT Press, Cambridge, 1986.• Hertz JA; Palmer RG; Krogh, AS. Introduction to the Theory of Neural

Computation. Addison-Wesley, Redwood City, 1991.• Pao, YH. Adaptive Pattern Recognition and Neural Networks. Addison-

Wesley, Reading, 1989.• Reggia JA. Neural computation in medicine. Artificial Intelligence in

Medicine, 1993 Apr, 5(2):143–57.• Miller AS; Blott BH; Hames TK. Review of neural network applications

in medical imaging and signal processing.Medical and Biological Engineering and Computing, 1992 Sep, 30(5):449–64.

• Bishop CM. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.

Neural Networks and Logistic Regression Lucila Ohno-Machado Decision Systems Group Brigham and...

Documents

Transcript of Neural Networks and Logistic Regression Lucila Ohno-Machado Decision Systems Group Brigham and...

João Machado

Lucila Fernández Alle2**

Lucila Ohno-Machado An introduction to calibration and discrimination methods HST951 Medical Decision Support Harvard Medical School Massachusetts Institute.

Dynamic Capacity Development in East Asian Industrialization Izumi Ohno & Kenichi Ohno (GRIPS) July 2008.

hno 12:26 2019 18:15 (web) LINE TICKET … · 2019. 10. 21. · Yuji Ohno & Lupintic Six with Fujikochans &THE EXPLOSION Yuji Ohno Ohno & Ohno & Lupintic ujikoch ns ! Lay Y7R—Jb

biomedical and healthCAre Data Discovery Index Ecosystem · biomedical and healthCAre Data Discovery Index Ecosystem Lucila Ohno-Machado University of California San Diego Research

Tesis Lucila Ganoza

Escritoras y sociedad. El caso de Lucila Gamero

Japanese Approach to Growth Support and Dynamic Capacity Development Izumi Ohno, GRIPS i-ohno@grips.ac.jpi-ohno@grips.ac.jp (Room E-411) International.

CSJ357 Machado

Andres Machado

Orientation and Overview: Evolution of International Development Policy Izumi Ohno, GRIPS i-ohno@grips.ac.jpi-ohno@grips.ac.jp (Room E-411) International.

Fernanda Lucila Garza, Industrial Designer

Development Cooperation Policies of Major Donors Izumi Ohno, GRIPS i-ohno@grips.ac.jpi-ohno@grips.ac.jp (Room E-411) International Development Policy Lecture.

Decision Analysis - MIT OpenCourseWare · Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo 6.873/HST.951 Medical Decision Support Fall 2005 Decision Analysis

Hideo Ohno - Tohoku University Official English Website · Hideo Ohno Research Institute of Electrical Communication Originally from Tokyo, Hideo Ohno graduated in 1977 from the University

Top Ten Taiichi Ohno Quotes Dozuki

Case Selection and Resampling Lucila Ohno-Machado HST951.

Sofia y lucila. the perfect crime!!!

Machado Case