Feed-Forward Artificial Neural Networks
description
Transcript of Feed-Forward Artificial Neural Networks
![Page 1: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/1.jpg)
1
Feed-Forward Artificial Neural Networks
AMIA 2003, Machine Learning TutorialConstantin F. Aliferis & Ioannis Tsamardinos
Discovery Systems LaboratoryDepartment of Biomedical Informatics
Vanderbilt University
![Page 2: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/2.jpg)
2
Binary Classification Example
Value of predictor 1
Val
ue o
f pre
dict
or 2
Example:
•Classification to malignant or benign breast cancer from mammograms
•Predictor 1: lump thickness
•Predictor 2: single epithelial cell size
![Page 3: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/3.jpg)
3
Possible Decision Areas
Value of predictor 1
Val
ue o
f pre
dict
or 2
Class area: red circles
Class area: Green triangles
![Page 4: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/4.jpg)
4
Possible Decision Areas
Value of predictor 1
Val
ue o
f pre
dict
or 2
![Page 5: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/5.jpg)
5
Possible Decision Areas
Value of predictor 1
Val
ue o
f pre
dict
or 2
![Page 6: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/6.jpg)
6
Binary Classification Example
Value of predictor 1
Val
ue o
f pre
dict
or 2
The simplest non-trivial decision function is the straight line (in general a hyperplane)
One decision surface
Decision surface partitions space into two subspaces
![Page 7: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/7.jpg)
7
Specifying a Line
Line equation: w2x2+w1x1+w0 = 0
Classifier: If w2x2+w1x1+w0 0
Output Else
Output
x1
x 2
w2x2+w1x1+w0 = 0
w2x2+w1x1+w0 > 0
w2x2+w1x1+w0 < 0
![Page 8: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/8.jpg)
8
Classifying with Linear Surfaces
Use 1 for Use -1 for Classifier becomes
sgn(w2x2+w1x1+w0)= sgn(w2x2+w1x1+w0x0),
with x0=1 alwaysx1
x 2
w2x2+w1x1+w0 = 0
w2x2+w1x1+w0 > 0
w2x2+w1x1+w0 < 0
)sgn( predictors ofnumber ,)sgn(0
xwnxwn
iii
![Page 9: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/9.jpg)
9
The Perceptron
1Input:(attributes of patient to classify)
Output: classification of patient (malignant or benign)
W4=3
W2=0W2=-2 W1=4 W0=2
x0x4 x3 x2 x1
![Page 10: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/10.jpg)
10
The Perceptron
2 3.1 4 10Input:(attributes of patient to classify)
Output: classification of patient (malignant or benign)
W4=3
W2=0W2=-2 W1=4 W0=2
x0x4 x3 x2 x1
![Page 11: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/11.jpg)
11
The Perceptron
3 3.1 4 10Input:(attributes of patient to classify)
Output: classification of patient (malignant or benign)
W4=3
W2=0W2=-2 W1=4 W0=2
x0x4 x3 x2 x1
30
n
iiixw
![Page 12: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/12.jpg)
12
The Perceptron
2 3.1 4 10Input:(attributes of patient to classify)
Output: classification of patient (malignant or benign)
W4=3
W2=0W2=-2 W1=4 W0=2
x0x4 x3 x2 x1
30
n
iiixw
sgn(3)=1
![Page 13: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/13.jpg)
13
The Perceptron
Input:(attributes of patient to classify)
Output: classification of patient (malignant or benign)
2 3.1 4 10
W4=3
W2=0W2=-2 W1=4 W0=2
x0x4 x3 x2 x1
30
n
iiixw
sgn(3)=1
1
![Page 14: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/14.jpg)
14
Learning with Perceptrons
Hypotheses Space: Inductive Bias: Prefer hypotheses that do not
misclassify any of the training instances (or minimize an error function).
Search method: perceptron training rule, gradient descent, etc.
Remember: the problem is to find “good” weights
}|{ 1 nwwH
![Page 15: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/15.jpg)
15
Training Perceptrons Start with random
weights Update in an intelligent
way to improve them Intuitively:
Decrease the weights that increase the sum
Increase the weights that decrease the sum
Repeat for all training instances until convergence
2 3.1 4 10
30 -2 4 2
x0x4 x3 x2 x1
30
n
iiixw
sgn(3)=1
1True Output: -1
![Page 16: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/16.jpg)
16
Perceptron Training Rule
t : output the current example should have o: output of the perceptron xi: value of predictor variable i t=o : No change (for correctly classified examples)
iii
ii
www
xotw
'
)(
![Page 17: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/17.jpg)
17
Perceptron Training Rule
t=-1, o=1 : sum will decrease t=1, o=-1 : sum will increase
iii
ii
www
xotw
'
)(
2' )( iiiiiiiii xotxwxwxwxw
)sgn( iixw
![Page 18: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/18.jpg)
18
Perceptron Training Rule
In vector form:
iii
ii
www
xotw
'
)(
xotww )('
![Page 19: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/19.jpg)
19
Example of Perceptron Training: The OR function
x1
x 2
0 1
1
-1
1 1
1
![Page 20: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/20.jpg)
20
Example of Perceptron Training: The OR function
Initial random weights:
Define line:x1=0.5
x1
x 2
0 1
1
-1
1 1
1
05.010 12 xx
![Page 21: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/21.jpg)
21
Example of Perceptron Training: The OR function
Initial random weights:
Defines line:x1=0.5
x1
x 2
0 1
1
-1
1 1
1
05.010 12 xx
x1=0.5
Area where classifier outputs 1
Area where classifier outputs -1
![Page 22: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/22.jpg)
22
Example of Perceptron Training: The OR function
x1
x 2
0 1
1
-1
1 1
1
x1=0.5
Only misclassified example x1=0, x2=1
![Page 23: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/23.jpg)
23
Example of Perceptron Training: The OR function
x1
x 2
0 1
1
-1
1 1
1
x1=0.5
Only misclassified example x1=0, x2=1
05.010 : LineOld 12 xx
5.0,1,1
1,0,115.0,1,0
1,0,1))1(1(5.05.0,1,0
)(
'
'
'
'
w
w
w
xotww
![Page 24: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/24.jpg)
24
Example of Perceptron Training: The OR function
x1
x 2
0 1
1
-1
1 1
1
x1=0.5
Only misclassified example x1=0, x2=1
05.011 :New 12 xx
For x2=0, x1=-0.5
For x1=0, x2=-0.5
So, new line is:
![Page 25: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/25.jpg)
25
Example of Perceptron Training: The OR function
x1
x 2
0 1
1
-1
1 1
1
x1=0.5
05.011 :New 12 xx
For x2=0, x1=-0.5
For x1=0, x2=-0.5
-0.5
-0.5
05.011 12 xx
![Page 26: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/26.jpg)
26
Example of Perceptron Training: The OR function
x1
x 2
0 1
1
-1
1 1
1
-0.5
-0.5
05.011 12 xx
Misclassified example
Next iteration:
5.0,1,1
1,0,015.0,1,1
1,0,0)11(5.05.0,1,1
)(
'
'
'
'
w
w
w
xotww
![Page 27: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/27.jpg)
27
Example of Perceptron Training: The OR function
x1
x 2
0 1
1
-1
1 1
1
-0.5
-0.5
05.011 12 xx
Perfect classification
No change occurs next
![Page 28: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/28.jpg)
28
Analysis of the Perceptron Training Rule Algorithm will always converge within finite
number of iterations if the data are linearly separable.
Otherwise, it may oscillate (no convergence)
![Page 29: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/29.jpg)
29
Training by Gradient Descent
Similar but: Always converges Generalizes to training networks of
perceptrons (neural networks) Idea:
Define an error function Search for weights that minimize the error, i.e.,
find weights that zero the error gradient
![Page 30: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/30.jpg)
30
Setting Up the Gradient Descent Mean Squared Error:
Minima exist where gradient is zero:
Dd
dd otwE 2)(21)(
Ddd
idd
Dddd
idd
Dddd
i
Dddd
ii
ow
ot
otw
ot
otw
otww
E
)()(
)()(221
)(21
)(21
2
2
![Page 31: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/31.jpg)
31
The Sign Function is not Differentiable
0except everywhere ,0)(
di
dd
i
owoo
w
x 20 1
1
-1
1 1
1
x1=0.5
![Page 32: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/32.jpg)
32
Use Differentiable Transfer Functions
Replace
with the sigmoid
)sgn( xw
))(1)(()(1
1)(
)(
ysigysigdyydsig
eysig
xwsig
y
![Page 33: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/33.jpg)
33
Calculating the Gradient
Ddddddd
Dddidddd
Ddd
idddd
Dd i
d
d
ddd
Ddd
idd
Ddd
idd
Ddd
idd
i
xsigsigotwE
xsigsigot
xww
sigsigot
wsigot
sigw
ot
xwsigw
ot
ow
otwE
))(1)(()()(
))(1)(()(
)())(1)(()(
)()(
))(()(
))(()(
)()(
,
![Page 34: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/34.jpg)
34
Updating the Weights with Gradient Descent
Each weight update goes through all training instances
Each weight update more expensive but more accurate
Always converges to a local minimum regardless of the data
Ddddddd xsigsigotww
wEww
))(1)(()(
)(
![Page 35: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/35.jpg)
35
Multiclass Problems
E.g., 4 classes Two possible solutions
Use one perceptron (output unit) and interpret the results as follows:
Output 0-0.25, class 1 Output 0.25 – 0.5, class 2 Output 0.5 – 0.75, class 3 Output 0.75 – 1, class 4
Use four output units and a binary encoding of the output (1-of-m encoding).
![Page 36: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/36.jpg)
36
1-of-M Encoding
Assign to class with largest output
x0x4 x3 x2 x1
Output for Class 1
Output for Class 2
Output for Class 3
Output for Class 4
![Page 37: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/37.jpg)
37
Feed-Forward Neural Networks
x0x4 x3 x2 x1
Output Layer
Hidden Layer 2
Hidden Layer 1
Input Layer
![Page 38: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/38.jpg)
38
Increased Expressiveness Example: Exclusive OR
x 2
0 1
1
-1
1
1
-1
No line (no set of three weights) can separate the training examples (learn the true function).
x2 x1 x0
w2 w1 w0
![Page 39: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/39.jpg)
39
Increased Expressiveness Examplex 2
0 1
1
-1
1
1
-1
x2 x1 x0
w2,1 w1,1
w0,1
w2,2
w1,2 w0,2
w’1,1 w’2,1
![Page 40: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/40.jpg)
40
Increased Expressiveness Examplex 2
0 1
1
-1
1
1
-1
H1
x2 x1 x0
1 -1
0.5
H2
-1
1 0.5
O
11
X1 X2 C
0 0 -1
0 1 1
1 0 1
1 1 -1
![Page 41: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/41.jpg)
41
Increased Expressiveness Examplex 2
-1
1
1
-1
H1
x2 x1 x0
1 -1
0.5
H2
-1
1 0.5
O
11
X1 X2 C H1 H2 O
T1 0 0 -1
T2 0 1 1
T3 1 0 1
T4 1 1 -1
x1
![Page 42: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/42.jpg)
42
Increased Expressiveness Examplex 2
-1
1
1
-1
0 0
H1
x2 x1
1x0
1 -1
-0.5
H2
-1
1 -0.5
O
11
X1 X2 C H1 H2 O
T1 0 0 -1
T2 0 1 1
T3 1 0 1
T4 1 1 -1
x1
![Page 43: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/43.jpg)
43
Increased Expressiveness Examplex 2
-1
1
1
-1
0 0
-1
x2 x1
1x0
1 -1
-0.5
-1
-1
1 -0.5
-1
11
X1 X2 C H1 H2 O
T1 0 0 -1 -1 -1 -1
T2 0 1 1
T3 1 0 1
T4 1 1 -1
x1
![Page 44: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/44.jpg)
44
Increased Expressiveness Examplex 2
-1
1
1
-1
0 1
-1
x2 x1
1x0
1 -1
-0.5
1
-1
1 -0.5
1
11
X1 X2 C H1 H2 O
T1 0 0 -1 -1 -1 -1
T2 0 1 1 -1 1 1
T3 1 0 1
T4 1 1 -1
x1
![Page 45: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/45.jpg)
45
Increased Expressiveness Examplex 2
-1
1
1
-1
1 0
1
x2 x1
1x0
1 -1
-0.5
-1
-1
1 -0.5
1
11
X1 X2 C H1 H2 O
T1 0 0 -1 -1 -1 -1
T2 0 1 1 -1 1 1
T3 1 0 1 1 -1 1
T4 1 1 -1
x1
![Page 46: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/46.jpg)
46
Increased Expressiveness Examplex 2
-1
1
1
-1
1 1
-1
x2 x1
1x0
1 -1
-0.5
-1
-1
1 -0.5
1
11
X1 X2 C H1 H2 O
T1 0 0 -1 -1 -1 -1
T2 0 1 1 -1 1 1
T3 1 0 1 1 -1 1
T4 1 1 -1 -1 -1 -1
x1
![Page 47: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/47.jpg)
47
Increased Expressiveness Examplex 2
-1
1
1
-1
1 1
-1
x2 x1
1x0
1 -1
-0.5
-1
-1
1 -0.5
1
11
X1 X2 C H1 H2 O
T1 0 0 -1 -1 -1 -1
T2 0 1 1 -1 1 1
T3 1 0 1 1 -1 1
T4 1 1 -1 -1 -1 -1
x1
![Page 48: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/48.jpg)
48
Increased Expressiveness Examplex 2
-1
1
1
-1
1 1
-1
x2 x1
1x0
1 -1
-0.5
-1
-1
1 -0.5
1
11
X1 X2 C H1 H2 O
T1 0 0 -1 -1 -1 -1
T2 0 1 1 -1 1 1
T3 1 0 1 1 -1 1
T4 1 1 -1 -1 -1 -1
x1
![Page 49: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/49.jpg)
49
From the Viewpoint of the Output Layer
H1 H2
O
11
C H1 H2 O
T1 -1 -1 -1 -1
T2 1 -1 1 1
T3 1 1 -1 1
T4 -1 -1 -1 -1
T1
T3
T2x1
x 2
T4
Mapped By Hidden Layer to:
T1T3
T2
H1
H2
T4
![Page 50: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/50.jpg)
50
From the Viewpoint of the Output Layer
T1
T3
T2x1
x 2
T4
Mapped By Hidden Layer to:
T1T3
T2
H1
H2
T4
•Each hidden layer maps to a new instance space
•Each hidden node is a new constructed feature
•Original Problem may become separable (or easier)
![Page 51: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/51.jpg)
51
How to Train Multi-Layered Networks
Select a network structure (number of hidden layers, hidden nodes, and connectivity).
Select transfer functions that are differentiable.
Define a (differentiable) error function. Search for weights that minimize the error
function, using gradient descent or other optimization method.
BACKPROPAGATION
![Page 52: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/52.jpg)
52
How to Train Multi-Layered Networks Select a network structure
(number of hidden layers, hidden nodes, and connectivity).
Select transfer functions that are differentiable.
Define a (differentiable) error function.
Search for weights that minimize the error function, using gradient descent or other optimization method. x2 x1 x0
w2,1 w1,1
w0,1
w2,2
w1,2 w0,2
w’1,1 w’2,1
Dd
dd otwE 2)(21)(
![Page 53: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/53.jpg)
53
Back-Propagating the Error
x2 x1 x0
Output unit(s):
Hidden unit(s):
Hidden unit(s):
ijw
jkw
wji: weight vector from unit j to unit i
xj : jth output
i, index for output layer
j, index for hidden layer
k, index for input layer
ix
jx
kx
![Page 54: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/54.jpg)
54
Back-Propagating the Error
ii
d
Dddidddd
i
xsigsigotwE
xw
xsigsigotwE
))(1)(()(
example trainingsinglea of onContributi
))(1)(()( ,
x2 x1 x0
ijw
jkw
ix
jx
kx
![Page 55: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/55.jpg)
55
Back-Propagating the Error
x2 x1 x0
ijw
jkw
ix
jx
kx
)(
))(1)((
ot
sigsigxwE
jij
![Page 56: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/56.jpg)
56
Back-Propagating the Error
x2 x1 x0
ijw
jkw
ix
jx
kx?
))(1)((
j
kjjk
sigsigxwE
![Page 57: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/57.jpg)
57
Back-Propagating the Error
x2 x1 x0
ijw
jkw
ix
jx
kx
Outputsiiijj
kjjk
w
sigsigxwE
))(1)((
![Page 58: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/58.jpg)
58
Back-Propagating the Error
x2 x1 x0
ijw
jkw
ix
jx
kx
miw
mx )( mmm xt
Outputsm
mmii w
NextLayeri
iijj w
rttttrtr
tr
trtrtr
xsigsigwwwwEww
))(1)((
)(
![Page 59: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/59.jpg)
59
Training with BackPropagation
Going once through all training examples and updating the weights: one epoch
Iterate until a stopping criterion is satisfied The hidden layers learn new features and
map to new spaces Training reaches a local minimum of the error
surface
![Page 60: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/60.jpg)
60
Overfitting with Neural Networks
If number of hidden units (and weights) is large, it is easy to memorize the training set (or parts of it) and not generalize
Typically, the optimal number of hidden units is much smaller than the input units
Each hidden layer maps to a space of smaller dimension
Training is stopped before local minimum is reached
![Page 61: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/61.jpg)
61
Typical Training Curve
Epoch
Error
Real Error or on an independent validation set
Error on Training Set
Ideal training stoppage
![Page 62: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/62.jpg)
62
Example of Training Stopping Criteria
Split data to train-validation-test sets Train on train, until error in validation set is
increasing (more than epsilon the last m iterations, or
until a maximum number of epochs is reached Evaluate final performance on test set
![Page 63: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/63.jpg)
63
Classifying with Neural Networks
Determine representation of input: E.g., Religion {Christian, Muslim, Jewish} Represent as one input taking three different
values, e.g. 0.2, 0.5, 0.8 Represent as three inputs, taking 0/1 values
Determine representation of output (for multiclass problems) Single output unit vs Multiple binary units
![Page 64: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/64.jpg)
64
Classifying with Neural Networks Select
Number of hidden layers Number of hidden units Connectivity Typically: one hidden layer, hidden units is a
small fraction of the input units, full connectivity
Select error function Typically: minimize mean squared error or
maximize log likelihood of the data
![Page 65: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/65.jpg)
65
Classifying with Neural Networks
Select a training method: Typically gradient descent (corresponds to
vanilla Backpropagation) Other optimization methods can be used:
Backpropagation with momentum Trust-Region Methods Line-Search Methods
Congugate Gradient methods Newton and Quasi-Newton Methods
Select stopping criterion
![Page 66: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/66.jpg)
66
Classifying with Neural Networks
Select a training method: May include also searching for optimal
structure May include extensions to avoid getting stuck
in local minima Simulated annealing Random restarts with different weights
![Page 67: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/67.jpg)
67
Classifying with Neural Networks
Split data to: Training set: used to update the weights Validation set: used in the stopping criterion Test set: used in evaluating generalization
error (performance)
![Page 68: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/68.jpg)
68
Other Error Functions in Neural Networks Adding a penalty term for weight magnitude Minimizing cross entropy with respect to
target values network outputs interpretable as probability
estimates
![Page 69: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/69.jpg)
69
Representational Power
Perceptron: Can learn only linearly separable functions
Boolean Functions: learnable by a NN with one hidden layer
Continuous Functions: learnable with a NN with one hidden layer and sigmoid units
Arbitrary Functions: learnable with a NN with two hidden layers and sigmoid units
Number of hidden units in all cases unknown
![Page 70: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/70.jpg)
70
Issues with Neural Networks
No principled method for selecting number of layers and units Tiling: start with a small network and keep
adding units Optimal brain damage: start with a large
network and keep removing weights and units Evolutionary methods: search in the space of
structures for one that generalizes well No principled method for most other design
choices
![Page 71: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/71.jpg)
71
Important but not Covered in This Tutorial Very hard to understand the classification
logic from direct examination of the weights Large recent body of work in extracting
symbolic rules and information from Neural Networks
Recurrent Networks, Associative Networks, Self-Organizing Maps, Committees or Networks, Adaptive Resonance Theory etc.
![Page 72: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/72.jpg)
72
Why the Name Neural Networks?
Initial models that simulate real neurons to use for classification
Efforts to improve and understand classification independent of similarity to biological neural networks
Efforts to simulate and understand biological neural networks to a larger degree
![Page 73: Feed-Forward Artificial Neural Networks](https://reader036.fdocuments.in/reader036/viewer/2022081505/5681682d550346895dddc7f4/html5/thumbnails/73.jpg)
73
Conclusions Can deal with both real and discrete domains Can also perform density or probability estimation Very fast classification time Relatively slow training time (does not scale to
thousands of inputs) One of the most successful classifiers yet Successful design choices still a black art Easy to overfit or underfit if care is not applied