Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
-
date post
20-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
![Page 1: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/1.jpg)
Cooperating Intelligent Systems
Statistical learning methodsChapter 20, AIMA
(only ANNs & SVMs)
![Page 2: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/2.jpg)
Artificial neural networks
The brain is a pretty intelligent system.
Can we ”copy” it?
There are approx. 1011 neurons in the brain.
There are approx. 23109 neurons in the male cortex (females have about 15% less).
![Page 3: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/3.jpg)
The simple model
• The McCulloch-Pitts model (1943)
Image from Neuroscience: Exploring the brain by Bear, Connors, and Paradiso
y = g(w0+w1x1+w2x2+w3x3)
w1
w2
w3
![Page 4: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/4.jpg)
Transfer functions g(z)
The Heaviside function The logistic function
![Page 5: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/5.jpg)
The simple perceptron
With {-1,+1} representation
Traditionally (early 60:s) trained with Perceptron learning.
0 if1
0 if1]sgn[)(
y
T
TT
xw
xwxwx
22110 xwxwwT xw
![Page 6: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/6.jpg)
Perceptron learning
Repeat until no errors are made anymore1. Pick a random example [x(n),f(n)]
2. If the classification is correct, i.e. if y(x(n)) = f(n) , then do nothing
3. If the classification is wrong, then do the following update to the parameters (, the learning rate, is a small positive number)
Bn
Annf
class tobelongs )( if1
class tobelongs )( if1)(
x
xDesired output
)()(1 nxnfww iii
![Page 7: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/7.jpg)
Example: Perceptron learning
The AND function
x1
x2x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 +1
Initial values:
= 0.3
1
1
5.0
w
![Page 8: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/8.jpg)
Example: Perceptron learning
The AND function
x1
x2x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 +1
1
1
5.0
w
This one is correctlyclassified, no action.
![Page 9: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/9.jpg)
Example: Perceptron learning
The AND function
x1
x2x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 +1
1
1
5.0
w
This one is incorrectlyclassified, learning action.
7.01
10
8.01
22
11
00
ww
ww
ww
![Page 10: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/10.jpg)
Example: Perceptron learning
The AND function
x1
x2x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 +1
7.0
1
8.0
w
This one is incorrectlyclassified, learning action.
7.01
10
8.01
22
11
00
ww
ww
ww
![Page 11: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/11.jpg)
Example: Perceptron learning
The AND function
x1
x2x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 +1
7.0
1
8.0
w
This one is correctlyclassified, no action.
![Page 12: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/12.jpg)
Example: Perceptron learning
The AND function
x1
x2x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 +1This one is incorrectlyclassified, learning action.
7.00
7.01
1.11
22
11
00
ww
ww
ww
7.0
1
8.0
w
![Page 13: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/13.jpg)
Example: Perceptron learning
The AND function
x1
x2x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 +1
7.0
7.0
1.1
w
This one is incorrectlyclassified, learning action.
7.00
7.01
1.11
22
11
00
ww
ww
ww
![Page 14: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/14.jpg)
Example: Perceptron learning
The AND function
x1
x2x1 x2 f
0 0 -1
0 1 -1
1 0 -1
1 1 +1
7.0
7.0
1.1
w Final solution
![Page 15: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/15.jpg)
Perceptron learning
• Perceptron learning is guaranteed to find a solution in finite time, if a solution exists.
• Perceptron learning cannot be generalized to more complex networks.
• Better to use gradient descent – based on formulating an error and differentiable functions
N
n
nynfE1
2),()()( WW
![Page 16: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/16.jpg)
Gradient search
)(WW EW
W
E(W)
“Go downhill”
The “learning rate” () is set heuristically
W(k)
W(k+1) = W(k) + W(k)
![Page 17: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/17.jpg)
The Multilayer Perceptron (MLP)
• Combine several single layer perceptrons.
• Each single layer perceptron uses a sigmoid function (C)E.g.
x k
h j
h i
y l 1)exp(1)(
)tanh()(
zz
zz
outp
ut
inpu
t
Can be trained using gradient descent
![Page 18: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/18.jpg)
Example: One hidden layer
• Can approximate any continuous function
(z) = sigmoid or linear, (z) = sigmoid.
x k
h j
y i
D
kkjkjj
J
jjijii
xwwh
hvvy
10
10
)(
)()(
x
xx
![Page 19: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/19.jpg)
Example of computing the gradient
)(WEW W
N
n
N
n
eN
nynxWyN
MSEWE1 1
22 1))())(,(ˆ(
1)(
N
nW
N
nW
N
nWW yne
Nnene
Nne
NWE
111
2 )ˆ)((2
))()((2
)(1
)(
What we need to do is to compute yW ˆ
))((ˆ11
K
kkjkj
J
jj wxhvy
We have the complete equation for the network:
![Page 20: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/20.jpg)
Example of computing the gradient
y
y
y
y
y
j
kj
j
v
v
w
w
W
ˆ
ˆ
ˆ
ˆ
ˆ0
0
kk
kjkjjk
kjkjj
jkjkj
w xwxhvwxhvww
yy
kj)()(
ˆˆ
)(1)()tanh()( 2 zhzhzzh
)(ˆ0
kkjkjjw wxhvy
j
![Page 21: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/21.jpg)
When should you stop learning?
• After a set number of learning epochs• When the change in the gradient becomes
smaller than a certain number• Validation data - “early stopping”
Classification error
Training epochs Preferred model
Training error
Validation (test error)
![Page 22: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/22.jpg)
RPROP (Resilient PROPagation)
))(()( iWii WEsigntWi
0)()()1(5.0
0)()()1(2.1)(
1
1
itWitWi
itWitWi
i WEWEift
WEWEiftt
ii
ii
No parameter tuning unlike standard backpropagation!
Parameter update rule:
Learning rate update rule:
![Page 23: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/23.jpg)
Model selection
5 6 7 8 9 10 11
0
0.1
0.2
0.3
0.4
0.5
Model type A
Classification error [%]
PD
F
2 3 4 5 6 7 8 9
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4Model type B
Classification error [%]
PD
F
Model type A Model type B2
3
4
5
6
7
8
9
10Errorbar plot
Cla
ssifi
catio
n er
ror
[%]
Use this to determine:• Number of hidden nodes• Which input signalsto use• If a pre-processing strategy is good or not• Etc...
Variability typically induced by:• Varying train and test data sets• Random initialmodel parameters
![Page 24: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/24.jpg)
Support vector machines
![Page 25: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/25.jpg)
Linear classifier on a linearly separable problem
There are infinitely manylines that have zero trainingerror.
Which line should we choose?
![Page 26: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/26.jpg)
There are infinitely manylines that have zero trainingerror.
Which line should we choose?
Choose the line with thelargest margin.
The “large margin classifier”
margin
Linear classifier on a linearly separable problem
![Page 27: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/27.jpg)
There are infinitely manylines that have zero trainingerror.
Which line should we choose?
Choose the line with thelargest margin.
The “large margin classifier”
margin
Linear classifier on a linearly separable problem
”Support vectors”
![Page 28: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/28.jpg)
The plane separating and is defined by
The dashed planes are given by
margin
Computing the margin
w
aT xw
ba
baT
T
xw
xw
![Page 29: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/29.jpg)
Divide by b
Define new w = w/b and = a/bmargin
Computing the margin
w 1//
1//
bab
babT
T
xw
xw
1
1
xw
xwT
T
We have defined a scalefor w and b
![Page 30: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/30.jpg)
We have
which gives
margin
Computing the margin
margin
1)(
1
w
wxw
xw
T
T
w
x
x + w
w
2margin
![Page 31: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/31.jpg)
w
Maximizing the margin isequal to minimizing
||w||
subject to the constraints
wTx(n) – +1 for all
wTx(n) – -1 for all
Quadratic programming problem, constraints can be included with Lagrange multipliers.
Linear classifier on a linearly separable problem
![Page 32: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/32.jpg)
N
n
N
n
N
m
TmnnD mnmynyL
1 1 1
)()()()(2
1xx
Quadratic programming problem
N
n
Tnp nnyL
1
21)()(
2
1 xww
Minimize cost (Lagrangian)
Minimum of Lp occurs at the maximum of (the Wolfe dual)
Only scalar productin cost. IMPORTANT!
![Page 33: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/33.jpg)
sn
Tn
T nnyy xxxwx )()(sgnsgn)(ˆ
Linear Support Vector Machine
Test phase, the predicted output
Still only scalar products in the expression.
![Page 34: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/34.jpg)
Example: Robot color vision(Competition 1999)
Classify the Lego pieces into red, blue, and yellow.Classify white balls, black sideboard, and green carpet.
![Page 35: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/35.jpg)
What the camera sees (RGB space)
Yellow
Green
Red
![Page 36: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/36.jpg)
Mapping RGB (3D) to rgb (2D)
BGR
Bb
BGR
Gg
BGR
Rr
![Page 37: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/37.jpg)
Lego in normalized rgb space
2Xx 6Cc
Input is 2D
x1
x2
Output is 6D:{red, blue, yellow,green, black, white}
![Page 38: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/38.jpg)
MLP classifier
E_train = 0.21%E_test = 0.24%2-3-1 MLP
Levenberg-Marquardt
Training time(150 epochs):51 seconds
![Page 39: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/39.jpg)
SVM classifier
E_train = 0.19%E_test = 0.20%SVM with
= 1000
2)(exp),( yxyx K
Training time:22 seconds
![Page 40: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/40.jpg)
Lab 4: Digit recognition
• Inputs (digits) are provided as 32x32 bitmaps. Task is to investigate how well these handwritten digits can be recognized by neural networks.
• Assignment includes changing in the program code to answer:
1.How good is the generalization performance? (test data error)
2.Can pre-processing improve performance?3.What is the best configuration of the
network?
![Page 41: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/41.jpg)
public AppTrain() { // create a new network of given size nn=new NN(32*32, 10, seed);
// each row contains 32*32+1 integer // create the matrix holding the data // read data into the matrix file=new TFile("digits.dat"); System.out.println(file.rows()+" digits have been loaded");
double[] input=new double[32*32]; double[] target=new double[10];
// the training session (below) is iterative for (int e=0; e<nEpochs; e++) { // reset the error accumulated over each training epoch double err=0; // in each epoch, go through all examples/tuples/digits // note: all examples are here used for training, consequently no systematic testing // you may consider dividing the data set into training, testing and validation sets. for (int p=0; p<file.rows(); p++) { for (int i=0; i<32*32; i++) input[i]=file.values[p][i]; // the last value on each row contains the target (0-9) // convert it to a double[] target vector for (int i=0; i<10; i++) { if (file.values[p][32*32]==i) target[i]=1; else target[i]=0; } // present a sample and // calculate errors and adjust weights err+=nn.train(input, target, eta); } System.out.println("Epoch "+e+" finished with error "+err/file.rows()); }
// save network weights in a file for later use, e.g. in AppDigits nn.save("network.m"); }
![Page 42: Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d405503460f94a1af88/html5/thumbnails/42.jpg)
/** classify * @param map the bitmap on the screen * @return int the most likely digit (0-9) according to network */ public int classify(boolean[][] map) { double[] input=new double[32*32]; for (int c=0; c<map.length; c++) { for (int r=0; r<map[c].length; r++) { if (map[c][r]) // bit set input[r*map[r].length+c]=1; else input[r*map[r].length+c]=0; } } // activate the network, produce output vector double[] output=nn.feedforward(input); // alternative version assumes that the network has been trained on an 8x8 map // double[] output=nn.feedforward(to8x8(input)); double highscore=0; int highscoreIndex=0; // print out each output value (gives an idea of the network's support for each digit). System.out.println("--------------"); for (int k=0; k<10; k++) { System.out.println(k+":"+(double)((int)(output[k]*1000)/1000.0)); if (output[k]>highscore) { highscore=output[k]; highscoreIndex=k; } } System.out.println("--------------"); return highscoreIndex; }