Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ......

67
Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo 6.873/HST.951 Medical Decision Support Spring 2005 Artificial Neural Networks Lucila Ohno-Machado (with many slides borrowed from Stephan Dreiseitl. Courtesy of Stephan Dreiseitl. Used with permission.)

Transcript of Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ......

Page 1: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo

6.873/HST.951 Medical Decision Support Spring 2005

ArtificialNeural Networks

Lucila Ohno-Machado (with many slides borrowed from Stephan Dreiseitl. Courtesy of Stephan Dreiseitl. Used with permission.)

Page 2: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Overview

• Motivation • Perceptrons • Multilayer perceptrons • Improving generalization

• Bayesian perspective

Page 3: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Motivation

Images removed due to copyright reasons.

benign lesion malignant lesionbenign lesion malignant lesion

Page 4: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Motivation

• Human brain – Parallel processing – Distributed representation – Fault tolerant – Good generalization capability

• Mimic structure and processing in computational model

Page 5: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Biological Analogy

Synapses

Axon

Dendrites

Synapses+ + + --

(weights)

Nodes

Page 6: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Perceptrons

Page 7: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

00

01 11

0010

10

00 11 Input patterns

Input layer

Output layer

11

01 11

11

1100

00

00 10

10 Sorted

.patterns

Page 8: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Activation functions

Page 9: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Perceptrons (linear machines)Input units

Cough Headache

weights Δ rule change weights todecrease the error

- what we gotwhat we wantedNo disease Pneumonia Flu Meningitis error

Output units

Page 10: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

0 1 0 0000Abdominal Pain Perceptron

Intensity DurationMale Age Temp WBC Pain Pain

adjustable weights37 10 11 120

AppendicitisDiverticulitis Ulcer Pain Cholecystitis Obstruction Pancreatitis Duodenal Non-specific Small Bowel Perforated

Page 11: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

ANDy

x1 x2

w1 w2

θ = 0.5

input output 00 01 10 11

0 0 0 1

f(x1w1 + x2w2) = y f(0w1 + 0w2) = 0 f(0w1 + 1w2) = 0 f(1w1 + 0w2) = 0 f(1w1 + 1w2) = 1

1, for a > θf(a) = 0, for a ≤ θ

θ

some possible values for w1 and w2 w1 w2

0.20 0.20 0.25 0.40

0.35 0.40 0.30 0.20

Page 12: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Single layer neural network

Output of unit j:

Input units

Input to unit

j

i

Input to unit

units

measured value of variable

Output +θj)oj = 1/ (1 + e- (aj )

j: aj = Σ wijai

i: ai

i

Page 13: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Single layer neural networkOutput of unit j:

Output oj = 1/ (1 + e - ( aj+θj)

units

Input to unit

j

i

Input to unit

measured

Input units

j: aj = Σ wijai

i: a i

value of variable i Increasing θ

0

1

-5 0 5

i

i

i

0.2

0.4

0.6

0.8

1.2

-15 -10 10 15

Ser es1

Ser es2

Ser es3

Page 14: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Training: Minimize ErrorInput units

Cough Headache

weights Δ rule change weights todecrease the error

- what we gotwhat we wantedNo disease Pneumonia Flu Meningitis error

Output units

Page 15: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Error Functions

• Mean Squared Error (for regression problems), where t is target, o is output

Σ(t - o)2/n

• Cross Entropy Error (for binary classification)

− Σ(t log o) + (1-t) log (1-o)

+θj)oj = 1/ (1 + e- (aj )

Page 16: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Error function

• Convention: w := (w0,w), x := (1,x) • w0 is “bias” y

x1 x2

w1 w2

θ = 0.5

• o = f (w • x) • Class labels ti ∈{+1,-1} • Error measure

– E = -Σ ti (w • x ) oi miscl.

i w

• How to minimize E?

y

x1 x2

w1 w2

1

= 0.5

Page 17: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Minimizing the Error

initial error

final error

local minimum

Error surface

derivative

initial trainedw w

Page 18: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Gradient descent

Local minimum

Global minimum

Error

Page 19: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Perceptron learning

w• Find minimum of E by iterating

k+1 = wk – η gradw E

• E = -Σ ti (w • x ) ⇒ii miscl.

gradw E = -Σ ti xii miscl.

w• “online” version: pick misclassified x

k+1 = wk + η ti xi

i

Page 20: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Perceptron learning

• Update rule wk+1 = wk + η ti xi

• Theorem: perceptron learning converges for linearly separable sets

Page 21: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Gradient descent• Simple function minimization algorithm

• Gradient is vector of partial derivatives

• Negative gradient is direction of steepest descent

3

3

2

2

1

10

0

20

10

00

0

1 1

2

2

3

3

Figures by MIT OCW.

Page 22: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Classification ModelInputs

Weights Output

Age 34

1

4 of beingAlive”

5

8

4 Σ 0.6

Gender “Probability

Stage

DependentIndependent Coefficients variable variables a, b, c px1, x2, x3

Prediction

Page 23: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Terminology

• Independent variable = input variable

• Dependent variable = output variable

• Coefficients = “weights” • Estimates = “targets”

• Iterative step = cycle, epoch

Page 24: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

XORy

x1 x2

w1 w2

θ = 0.5

input output 00 01 10 11

0 1 1 0

f(x1w1 + x2w2) = y f(0w1 + 0w2) = 0 f(0w1 + 1w2) = 1 f(1w1 + 0w2) = 1 f(1w1 + 1w2) = 0

some possible values for w1 and w2 w1 w2

1, for a > θf(a) = 0, for a ≤ θ

θ

Page 25: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

XOR

x 1 x 2

y input output

00 01 10 11

0 1 1 0

θ = 0.5 w 3 w5 w 4

z θ = 0.5

w1 w2

f(w1, w2, w3, w4, w5)

a possible set of values for ws

1, for a > θ(w1, w2, w3, w4, w5) f(a) = 0, for a ≤ θ

(0.3,0.3,1,1,-2) θ

Page 26: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

XORinput output

00 01 10 11

0 1 1 0

w1 w3 w2

w5 w6 θ = 0.5 for all units

w4

f(w1, w2, w3, w4, w5 , w6)

a possible set of values for ws

1, for a > θ(w1, w2, w3, w4, w5 , w6) f(a) = 0, for a ≤ θ

(0.6,-0.6,-0.7,0.8,1,1) θ

Page 27: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

From perceptrons to multilayer perceptrons

Why?

Page 28: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Abdominal Pain

37 10 1

Appendicitis Diverticulitis

Perforated Non-specificCholecystitis

Small BowelPancreatitis

1 20

Male Age Temp WBC PainIntensity

1

PainDuration

0 1 0 0000

adjustableweights

ObstructionPainDuodenal Ulcer

Page 29: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

0.8

Heart Attack Network

Duration Intensity ECG: ST Pain Pain elevationSmoker Age Male

Myocardial Infarction

112 1504

“Probability” of MI

Page 30: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Multilayered Perceptrons

Input units

Input to unit j: aj = Σwijai

j

i

Input to unit i: ai i

Output of unit j: oj = 1/ (1 + e- (aj+θj) )

Output units

Perceptron

MultilayeredInput to unit k:

perceptron

Output of unit k: ok = 1/ (1 + e- (ak+θk) )

k

Hidden units

ak = Σwjkoj

measured value of variable

Page 31: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Neural Network ModelInputs

Age 34

2

4

.6

.5

.8

.2

.1

.3.7

.2

ΣΣ

.4

.2 Σ

Output

0.6 Gender

“Probability

Stage of beingAlive”

Independent Weights Hidden Weights Dependent

variables Layer variable

Prediction

Page 32: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

“Combined logistic models”Inputs

Age 34

2

4

.6

.5

.8

.1

.7 Σ

Output

0.6 Gender

“Probability

Stage of beingAlive”

Independent Weights Hidden Weights Dependent

variables Layer variable

Prediction

Page 33: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Inputs

Age Output34

2

4

.5

.8 .2

.3

.2

Σ 0.6

Gender “Probability

Stage of beingAlive”

Independent Weights Hidden Weights Dependent

variables Layer variable

Prediction

Page 34: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Inputs

Age 34

1

4

.6.5

.8.2

.1

.3.7

.2

Σ

Output

0.6 Gender

“Probability

Stage of beingAlive”

Independent Weights Hidden Weights Dependent

variables Layer variable

Prediction

Page 35: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Not really, no target for hidden units...

Age 0.6

Gender

34

2

4

.6

.5

.8

.2

.1

.3.7

.2

ΣΣ

.4

.2 Σ “Probability

Stage of beingAlive”

Independent Weights Hidden Weights Dependent

variables Layer variable

Prediction

Page 36: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Hidden Units and Backpropagation

Output units

Hidden units

what we gotwhat we wanted -error

Δ rule

Δ rule

bai

ckpr opag ato n

Input units

Page 37: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Multilayer perceptrons

• Sigmoidal hidden layer • Can represent arbitrary decision regions

• Can be trained similar to perceptrons

Page 38: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

ECG Interpretation

R-R interval

S-T elevation

QRS duration

QRS amplitude

AVF lead

SV tachycardia

Ventricular tachycardia

LV hypertrophy

RV hypertrophy

Myocardial infarction

P-R interval

Page 39: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Linear SeparationSeparate n-dimensional space using one (n - 1)-dimensional space

Meningitis Flu No cough CoughHeadache Headache

CoughNo coughNo headache No headache

No disease Pneumonia

No treatment

00 10

01 11

000

010

101

111

110

Treatment

011

100

Page 40: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Another way of thinking about this…

• Have data set D = {(x ,t )} drawn fromi iprobability distribution P(x,t)

• Model P(x,t) given samples D by ANNwith adjustable parameter w

• Statisticsanalogy:

Page 41: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Maximum Likelihood Estimation

• Maximize likelihood of data D

• Likelihood L = Π p(x ,t ) = Π p(t |x )p(x )i i i i i

• Minimize -log L = -Σ log p(t |x ) -Σ log p(x )i i i

• Drop second term: does not depend on w

• Two cases: “regression” and classification

Page 42: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Likelihood for classification(ie categorical target)

• For classification, targets t are class labels

• Minimize -Σ log p(t |x )i i

• p(t 1-ti|x ) = y(x ,w) ti (1- y(x ,w)) ⇒i i i i-log p(t |x ) = -t log y(x ,w) -(1 – t ) * log(1-y(x ,w))i i i i i i

• Minimizing –log L equivalent to minimizing -[Σ t log y(x ,w) +(1 – t ) * log(1-y(x ,w))]i i i i(cross-entropy error)

Page 43: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Likelihood for “regression”(ie continuous target)

• For regression, targets t are real values

• Minimize -Σ log p(t |x )i i

• p(t |x ) = 1/Z exp(-(y(x ,w) – t )2/(2σ2)) ⇒i i i i-log p(t |x ) = 1/(2σ2) (y(x ,w) – t )2 +log Zi i i i

• y(x ,w) is network outputi

• Minimizing –log L equivalent to minimizing Σ (y(x ,w) – t )2 (sum-of-squares error)i i

Page 44: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Backpropagation algorithm

w

• Minimizing error function by gradient descent:

k+1 = wk – η gradw E • Iterative gradient

calculation by propagating error signals

Page 45: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Backpropagation algorithm

Problem: how to set learning rate η ?

Better: use more advanced minimization algorithms (second-order information)

2

2

0

0

-2

2

0

-2

-4 -2 20-4 -2

Figures by MIT OCW.

Page 46: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Backpropagation algorithm

Classification Regression

cross-entropy sum-of-squares

Page 47: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Overfitting

ModelReal Distribution Overfitted

Page 48: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Improving generalization

Problem: memorizing (x,t) combinations (“overtraining”)

0.7 0.5 0 0.9 -0.5 1

1 -1.2 -0.2

0.3 0.6 1

0.5 -0.2 ?

Page 49: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Improving generalization

• Need test set to judge performance

• Goal: represent information in data set, not noise

• How to improve generalization?– Limit network topology – Early stopping – Weight decay

Page 50: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Limit network topology

• Idea: fewer weights ⇒ less flexibility

• Analogy to polynomial interpolation:

Page 51: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Limit network topology

Page 52: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Early StoppingC

HD

error holdout

training

Overfitted model “Real” model Overfitted model

0 age cycles

Page 53: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Early stopping

• Idea: stop training when information (butnot noise) is modeled

• Need hold-out set to determine when to stop training

Page 54: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Overfitting

b = training set

a = hold-out set

Overfitted model

tss

Epochs

min (Δtss)

tss a

tss b

Stopping criterion

Page 55: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Early stopping

Page 56: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Weight decay

• Idea: control smoothness of network output by controlling size of weights

• Add term α||w||2 to error function

Page 57: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Weight decay

Page 58: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Bayesian perspective

• Error function minimization corresponds to maximum likelihood (ML) estimate: singlebest solution wML

• Can lead to overtraining• Bayesian approach: consider weight

posterior distribution p(w|D).

Page 59: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Bayesian perspective

• Posterior = likelihood * prior • p(w|D) = p(D|w) p(w)/p(D) • Two approaches to approximating p(w|D):

– Sampling – Gaussian approximation

Page 60: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Sampling from p(w|D)

prior likelihood

Page 61: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Sampling from p(w|D)

prior * likelihood = posterior

Page 62: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Bayesian example for regression

Page 63: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Bayesian example for classification

Page 64: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Model Features(with strong personal biases)

Modeling Examples Explanat. Effort Needed

Rule-based Exp. Syst. high low high? Classification Trees low high+ “high” Neural Nets, SVM low high low Regression Models high moderate moderate

Learned Bayesian Nets low high+ high (beautiful when it works)

Page 65: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Regression vs. Neural Networks

“X1 ” 1X3 ” “X1X2X3 ”

Y

“X2 ”

X1 X2 X3 X1X2 X1X3 X2X3

Y

(23 -1) possible combinations

X1X2X3

“X

X1 X2 X3

Y = a(X1) + b(X2) + c(X3) + d(X1X2) + ...

Page 66: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Summary

• ANNs inspired by functionality of brain

• Nonlinear data model • Trained by minimizing error function • Goal is to generalize well • Avoid overtraining • Distinguish ML and MAP solutions

Page 67: Artificial Neural Networks - MIT OpenCourseWare · PDF fileArtificial Neural Networks ... Single layer neural network Output of unit j: Input units Input to unit j i Input to unit

Some References

Introductory and Historical Textbooks • Rumelhart, D.E., and McClelland, J.L. (eds) Parallel Distributed

Processing. MIT Press, Cambridge, 1986. (H) • Hertz JA; Palmer RG; Krogh, AS. Introduction to the Theory of

Neural Computation. Addison-Wesley, Redwood City, 1991. • Pao, YH. Adaptive Pattern Recognition and Neural Networks.

Addison-Wesley, Reading, 1989. • Bishop CM. Neural Networks for Pattern Recognition. Clarendon

Press, Oxford, 1995.