Indian Institute of Technology Bombay MACHINE LEARNING.
-
Upload
adam-banks -
Category
Documents
-
view
224 -
download
1
Transcript of Indian Institute of Technology Bombay MACHINE LEARNING.
Indian Institute of Technology Bombay
MACHINE LEARNING
Indian Institute of Technology Bombay
Marc Chagall
Indian Institute of Technology Bombay
(Vincent van Gogh)
Indian Institute of Technology Bombay
Marc Chagall ? Or Vincent van Gogh?
Indian Institute of Technology Bombay
(Paul Gaugin)
Indian Institute of Technology Bombay
(Vincent van Gogh)
Indian Institute of Technology Bombay
7
Indian Institute of Technology Bombay
8
Indian Institute of Technology Bombay
Induction vs Deduction
• Deductive reasoning is the process of reasoning from one or more general statements (premises) to reach a logically certain conclusion.
• Inductive is reasoning in which the premises seek to supply strong evidence for (not absolute proof of) the truth of the conclusion.
Indian Institute of Technology Bombay
• The human mind is the best pattern recognizer and classifier, can recognize pattern in spite of noise and vagueness.
• 1. The human mind learns by induction
2. The human mind recognizes looking at the whole and not at individual parts.
MACHINE LEARNING
Indian Institute of Technology Bombay
Learning is a fundamental and essential characteristic of biological neural networks.
The ease with which they can learn led to attempts to emulate a biological neural network in a computer.
Indian Institute of Technology BombayIndian Institute of Technology Bombay
The human brain incorporates nearly 10 billion neurons and 60 trillion connections, synapses, between them. By using multiple neurons simultaneously, the brain can perform its functions much faster than the fastest computers in existence today.
How does human mind learn?
Soma Soma
Synapse
Synapse
Dendrites
Axon
Synapse
Dendrites
Axon
A neuron consists of a cell body, soma, a number of fibers called dendrites, and a single long fiber called the axon.
Indian Institute of Technology BombayIndian Institute of Technology Bombay
1. Human beings learn patterns by induction (seeing examples)
2. The knowledge acquired remains in their memory,
3. The knowledge is recalled when required to recognize a pattern not seen
before
Human Learning: Key features
Indian Institute of Technology BombayIndian Institute of Technology Bombay
• Show the computer several examples of a pattern repeatedly.
• Hope that it would learn the “diagnostic” characteristic of the
pattern.
• We make sure that the computer has learnt adequately (how?)
• The knowledge acquired by the computer will remains in their
“memory” (how?)
• The computer will recall the knowledge when asked to classify
an unseen pattern
Machines Learning : Key features
Indian Institute of Technology BombayIndian Institute of Technology Bombay • The human mind is much better than a computer at recognizing vague/noisy
patterns -
• A well-trained computer can process larger amount of information!
• Non-linear model –same feature gets different weights in different combinations
MACHINE LEARNING
Indian Institute of Technology BombayIndian Institute of Technology Bombay
16
• Downside – the computer will not tell you why it has classified a particular pattern in a particular way.
• A blackbox!!
• Like human mind!!
MACHINE LEARNING
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Problems with Probabilistic/Fuzzy methods
• Weights of Evidence– Correlation between maps
• Fuzzy Logic– Subjective judgment -> difficult to reproduce
Indian Institute of Technology BombayIndian Institute of Technology Bombay
MACHINE LEARNING
• Neural netwroks• Hybrid Neurofuzzy systems • Bayesian Classifier• Genetic Algorithms• SOM
Indian Institute of Technology BombayIndian Institute of Technology Bombay
19
• Resource potential modeling can be viewed as a pattern recognition
problem.
• Involves predictive classification of each spatial unit characterized by a
unique combination of spatially coincident predictor patterns (or
unique conditions) as mineralized or barren with respect to the target
mineral deposit-type. In machine learning jargon, its called a feature
vector
1 1
1 2
2 2
3 4
3 3 3 4
4555
MACHINE LEARNING
??
??
??
??
??
1.250000 - 1<155
4.10004 - 5 15 - 304
3125003 - 430 - 453
2.50002 - 345 - 602
5100001 - 2>601
Distance from permeable struct
Soil permeability
Drainage density
SlopeUnique
Condition No
Predictor patterns Class –
Potential (1) or
not potential (0)
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Attribute1(i), Attribute2(i), …………., Attribute6(i) 0
Attribute1(i), Attribute2(iii), …………., Attribute6(iv) 1
Attribute1(ii), Attribute2(i), …………., Attribute6(v) 1
Attribute1(v), Attribute2(i), …………., Attribute6(i) 0
…
…
Attribute1(iii), Attribute2(ii), …………., Attribute6(vi) 1
MACHINE LEARNING
Indian Institute of Technology BombayIndian Institute of Technology Bombay
UNIQUE CONDITIONS GRID
Indian Institute of Technology BombayIndian Institute of Technology Bombay
22
Converting GIS layers to feature vectors
targetoutput
1
Input feature vector
[3, 8, 33, 800]
GIS raster layers
SiO2 content
Rock type
Fe content
Distance to Fault
Deposits
11
0
0 00 0
0
00 0
0
1
00 0
0 0
0
0
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Targeted Output (d) =1
Input Vector
40
1120
600
ActualOutput (y)
NN 0.36
Error = (d – y) 0.64
Feed forward
Backpropagation
Deposits
SiO2 content
MgO content
Fe content
Distance to Fault
Deposits
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Inside the black-box………….???
A Layers of input neurons(Input layer -I)
A neuron (Nodes) (processing units)
A layer of Hidden neurons(Hidden layer -H)
A layer of output neurons(Output layer - O)
Neuron – Neural, but what is network?what is network? – Connect all neurons…..
w11w
12
w21w
22
w 31
w32w 41
w 42
x11
x 21
fi
fi
fi
fi
fh
fh
fo
Indian Institute of Technology Bombay
An artificial neural network consists of a number of very simple processors, also called neurons, which are analogous to the biological neurons in the brain.
The neurons are connected by weighted links passing signals from one neuron to another.
The output signal is transmitted through the neuron’s outgoing connection. The outgoing connection splits into a number of branches that transmit the same signal. The outgoing branches terminate at the incoming connections of other neurons in the network.
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Properties of architecture
• No connections within a layer• No direct connections between input and output layers• Fully connected between layers• Often more than 3 layers• Number of output units need not equal number of input units• Number of hidden units per layer can be more or less than input or
output units
Indian Institute of Technology BombayIndian Institute of Technology Bombay
The neuron computes the weighted sum of the input signals and compares the result with a threshold value, . If the net input is less than the threshold, the neuron output is 0/–1. But if the net input is greater than or equal to the threshold, the neuron becomes activated and its output attains a value +1.
The neuron uses the following transfer or activation function:
n
iiiwxX
1
X
XY
ge
XfY
if ,1/0
if ,1
..
)(
Neuron functions (Also called Activation functions)
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Activation functions of a neuron
S t e p f u n c t io n S ig n f u n c t io n
+ 1
-1
0
+ 1
-1
0X
Y
X
Y
+ 1
-1
0 X
Y
S ig m o id f u n c t io n
+ 1
-1
0 X
Y
L in e a r f u n c t io n
0 if ,0
0 if ,1
X
XY step
0 if ,1
0 if ,1
X
XY sign
Xsigmoid
eY
1
1XY linear
2
22
)(
cX
RBF eY
Radial basis function
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Σ..
∫p1
p2
pn
w11
w1ni
n
iii bxwu
11
)(ufz
f – activation function
t
Network output
yTarget output
Error
yterror
Σ – transfer function
INPUT LAYER
HIDDEN LAYER
OUTPUTLAYER
∫
∫
w21
w2n
.
.)(vfy
Σ
j
n
iii bzwv
12
Indian Institute of Technology BombayIndian Institute of Technology Bombay
NETWORK PARAMETERS
• Weights• Number of neurons• Function parameters
NETWORK TRAINING
Iterative modifications of network parameters to minimize error
TRAINING SAMPLES (VALIDATION SAMPLE)- Feature vectors whose class is known
Indian Institute of Technology BombayIndian Institute of Technology Bombay
TRAINING ALGORITHM
• Problem of assigning ‘credit’ or ‘blame’ to individual elements
involved in forming overall response of a learning system
(hidden units)
• In neural networks, problem relates to deciding which weights
should be altered, by how much and in which direction.
Analogous to deciding how much a weight in an early layer contributes to the output and thus the error
We therefore want to find out how weight wij affects the error i.e. we want:
)(
)(
tw
tE
ij
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Backpropagation learning algorithm ‘BP’
( Rumelhart, Hinton and Williams ,1986)
BP has two phases:
Forward pass phase: computes ‘functional signal’, feedforward propagation of input pattern signals through network
Backward pass phase: computes ‘error signal’, propagates the error backwards through network starting at output units (where the error is the difference between actual and desired output values)
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Uses gradient descent (steepest descent) and Delta Rule for minimizing error
Backpropagation learning algorithm ‘BP’
( Rumelhart, Hinton and Williams ,1986)
Any given combination of weights will be associated with a particular error measure. The Delta Rule uses gradient descent learning to iteratively change network weights to minimize error (i.e., to locate the global minimum in the error surface).
Indian Institute of Technology BombayIndian Institute of Technology Bombay
0
To find a minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a maximum of that function; the procedure is then known as gradient ascent.
Backpropagation learning algorithm ‘BP’
( Rumelhart, Hinton and Williams ,1986)
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Too small steps, slow convergence of error, but convergence to minima assuredToo big steps, fast convergence but minima may be missed
Backpropagation learning algorithm ‘BP’
( Rumelhart, Hinton and Williams ,1986)Step size: Learning rate
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Derivative: How a function changes as its input changesOr how much one quantity changes in response to a change in some other quantityfor example, the derivative of the position of a moving object with respect to time is the object's instantaneous velocity.≈ Slope/gradient
Backpropagation learning algorithm ‘BP’
( Rumelhart, Hinton and Williams ,1986)
Black: the graph of a functionRed: tangent line to that functionThe slope/gradient of the tangent line is equal to the derivative of the function at the marked point.
Black: Maximum ValueWhite: Minimum valueGradient points to wards higher values
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Partial derivative: Suppose a function has several variables. Partial derivative of the function with respect to one of the variables is how the function changes as that variable changes (other variables assumed constant)
Backpropagation learning algorithm ‘BP’
( Rumelhart, Hinton and Williams ,1986)
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Backpropagation learning algorithm ‘BP’
( Rumelhart, Hinton and Williams ,1986)
In the context of Neural networks - Function: ErrorVariables: weights/function parametersConceptual basis of weight adjustment:1. Determine partial derivative of error with respect to each of
the weights/parameters2. Adjust each weight in a direction opposite to the steepest
gradient
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Input feature vector
X
Input layer
I
Hidden layer
J
Output layer
K
X1
X2
X3
X4
KwJwIx
yGenericall
lkkji
barren) if 0 bearing, resource if,1( arg
1
1 K ofOutput
K Input to
1
1 J ofOutput
J Input to
I ofOutput
I Input to
K
K
k
J
TetT
eO
bOwI
eO
bXwI
X
X
K
KJ
J
I
KJJ
I
JIJI
Indian Institute of Technology BombayIndian Institute of Technology Bombay
1. Calculate errors of output neurons:
δK = OK (1 - OK) (Target - OK)
2. Change output layer weights
WJ_K= W J_K + η*δK *OJ
3. Calculate (back-propagate) hidden layer errors
δJ = OJ (1 – OJ) (δK *WJ_K )
4. Change hidden layer weights
WI1_J = WI1_J + η*δJ*x1
WI2_J = WI2_J + η*δJ*x2
Backpropagation learning algorithm ‘BP’
I1
I J K
Input layer
Hidden layer
Output layer
WI1_J
WI2_J
WJ_K
I2
X1
X2
The constant η (called the learning rate, and nominally equal to one) is put in to speed up or slow down the learning if required.
Indian Institute of Technology BombayIndian Institute of Technology Bombay
input1 60
Input2 25
Input3 120
Input4 5
Data
2 hidden neurons1 output neuronLearning rate 0.5Sigma functionStart with random weights between 0 and 1,Run the algorithm. See if the error is reduced in the next iteration.
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Practical considerations: Neural Network training• Collect all possible examples of the pattern
• Encode and format the data
• Classify in three subset:
• Training set (70%)
• Validation(20%)
• Testing set (10%)
• Or use n-fold (k-fold) validation (also called jack-knifing)
• GOLDEN RULE : Number of training set samples should be at least 3 times the number of parameters to be estimated –
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Practical considerations: Neural Network training:
Input data encoding and formatting
VALUE COUNTAREA SQKM Rock type
Distance to Fault (km) Soil type
Slope (Degree) Resource
1 62487 62487 4 1 1 10 12 446 446 3 2 1 11 13 383 383 3 1 3 10 14 91831 91831 3 1 2 12 05 2892 2892 2 2 2 14 06 1227 1227 3 3 3 14 17 934 934 1 4 1 11 08 102 102 2 2 1 9 19 601 601 1 1 2 9 0
10 2742 2742 2 7 3 9 111 2320 2320 1 7 2 8 112 289 289 2 7 1 8 013 1 1 3 9 1 6 014 21050 21050 1 10 2 6 115 2984 2984 4 2 1 8 116 69 69 3 2 1 9 117 174 174 2 2 2 7 018 21 21 1 2 1 6 019 379 379 1 3 3 10 020 23 23 1 4 2 11 0
Rock type Soil type1Granite 1Sandy2Sandstone 2Clayey3Shale 3Silty4Basalt
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Practical considerations: Neural Network training:
Input data encoding and formatting
VALUE COUNTAREA SQKM
Rock type
Distance to Fault
(km)
Soil type
Slope (Degree) ResourceGranite SSt Shale Basalt Sandy Clayey Silty
1 62487 62487 0 0 0 1 1 1 0 0 10 12 446 446 0 0 1 0 2 1 0 0 11 13 383 383 0 0 1 0 1 0 0 1 10 14 91831 91831 0 0 1 0 1 0 1 0 12 05 2892 2892 0 1 0 0 2 0 1 0 14 06 1227 1227 0 0 1 0 3 0 0 1 14 17 934 934 1 0 0 0 4 1 0 0 11 08 102 102 0 1 0 0 2 1 0 0 9 19 601 601 1 0 0 0 1 0 1 0 9 0
10 2742 2742 0 1 0 0 7 0 0 1 9 111 2320 2320 1 0 0 0 7 0 1 0 8 112 289 289 0 1 0 0 7 1 0 0 8 013 1 1 0 0 1 0 9 1 0 0 6 014 21050 21050 1 0 0 0 10 0 1 0 6 115 2984 2984 0 0 0 1 2 1 0 0 8 116 69 69 0 0 1 0 2 1 0 0 9 117 174 174 0 1 0 0 2 0 1 0 7 018 21 21 1 0 0 0 2 1 0 0 6 019 379 379 1 0 0 0 3 0 0 1 10 020 23 23 1 0 0 0 4 0 1 0 11 0
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Practical considerations: Neural Network training:
Input data encoding and formatting
0 0 0 1 1 1 0 0 10 10 0 1 0 2 1 0 0 11 10 0 1 0 1 0 0 1 10 10 0 1 0 1 0 1 0 12 00 1 0 0 2 0 1 0 14 00 0 1 0 3 0 0 1 14 11 0 0 0 4 1 0 0 11 00 1 0 0 2 1 0 0 9 11 0 0 0 1 0 1 0 9 00 1 0 0 7 0 0 1 9 11 0 0 0 7 0 1 0 8 10 1 0 0 7 1 0 0 8 00 0 1 0 9 1 0 0 6 01 0 0 0 10 0 1 0 6 10 0 0 1 2 1 0 0 8 10 0 1 0 2 1 0 0 9 10 1 0 0 2 0 1 0 7 01 0 0 0 2 1 0 0 6 01 0 0 0 3 0 0 1 10 01 0 0 0 4 0 1 0 11 0
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Practical considerations: Neural Network training:
Input data encoding and formatting
0 0 0 1 1 1 0 0 10 10 0 1 0 2 1 0 0 11 10 0 1 0 1 0 0 1 10 10 0 1 0 1 0 1 0 12 00 1 0 0 2 0 1 0 14 00 0 1 0 3 0 0 1 14 11 0 0 0 4 1 0 0 11 00 1 0 0 2 1 0 0 9 11 0 0 0 1 0 1 0 9 00 1 0 0 7 0 0 1 9 1
1 0 0 0 7 0 1 0 80 1 0 0 7 1 0 0 80 0 1 0 9 1 0 0 61 0 0 0 10 0 1 0 60 0 0 1 2 1 0 0 80 0 1 0 2 1 0 0 9
0 1 0 0 2 0 1 0 71 0 0 0 2 1 0 0 61 0 0 0 3 0 0 1 101 0 0 0 4 0 1 0 11
Training data
Validation data
Testing data
100111
0000
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Practical considerations: Neural Network training:
Training
1. Chose a subset of training samples2. Computer the error for the subset3. Update weights so as to reduce the error (e.g., using gradient descent)4. Calculate error for validation samples
The above 4 steps comprise one pass through the subset of training samples along with an updating of weights, called a “training epoch” Number of training samples in the subset is epoch size.You can use an epoch size of 1, or an epoch size of n (=number of training samples), or any size between 1 and n.
Save the weights/parameters after every training epoch
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Practical considerations: Neural Network training:
Training
Plot training and validation errors against number of training epochs
Validation error minimizes at 70 epochs, beyond which it begins to rise
=> The weights/parameters saved after 70th epoch comprise the trained network
Indian Institute of Technology BombayIndian Institute of Technology Bombay
Practical considerations: Neural Network training:
Training
Before jumping to processing the samples to be classified, test your trained network with the testing samples (the third subset)
Indian Institute of Technology BombayIndian Institute of Technology Bombay
05
10152025303540
2 4 6 8 10
Number of hidden units
Per
cen
t E
rro
rValidation SetError
Training SetError
Optimization of the number of hidden neurons