Indian Institute of Technology Bombay MACHINE LEARNING.

50
Indian Institute of Technology Bombay MACHINE LEARNING

Transcript of Indian Institute of Technology Bombay MACHINE LEARNING.

Page 1: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

MACHINE LEARNING

Page 2: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

Marc Chagall

Page 3: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

(Vincent van Gogh)

Page 4: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

Marc Chagall ? Or Vincent van Gogh?

Page 5: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

(Paul Gaugin)

Page 6: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

(Vincent van Gogh)

Page 7: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

7

Page 8: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

8

Page 9: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

Induction vs Deduction

• Deductive reasoning is the process of reasoning from one or more general statements (premises) to reach a logically certain conclusion.

• Inductive is reasoning in which the premises seek to supply strong evidence for (not absolute proof of) the truth of the conclusion.

Page 10: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

• The human mind is the best pattern recognizer and classifier, can recognize pattern in spite of noise and vagueness.

• 1. The human mind learns by induction

2. The human mind recognizes looking at the whole and not at individual parts.

MACHINE LEARNING

Page 11: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

Learning is a fundamental and essential characteristic of biological neural networks.

The ease with which they can learn led to attempts to emulate a biological neural network in a computer.

Page 12: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

The human brain incorporates nearly 10 billion neurons and 60 trillion connections, synapses, between them. By using multiple neurons simultaneously, the brain can perform its functions much faster than the fastest computers in existence today.

How does human mind learn?

Soma Soma

Synapse

Synapse

Dendrites

Axon

Synapse

Dendrites

Axon

A neuron consists of a cell body, soma, a number of fibers called dendrites, and a single long fiber called the axon.

Page 13: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

1. Human beings learn patterns by induction (seeing examples)

2. The knowledge acquired remains in their memory,

3. The knowledge is recalled when required to recognize a pattern not seen

before

Human Learning: Key features

Page 14: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

• Show the computer several examples of a pattern repeatedly.

• Hope that it would learn the “diagnostic” characteristic of the

pattern.

• We make sure that the computer has learnt adequately (how?)

• The knowledge acquired by the computer will remains in their

“memory” (how?)

• The computer will recall the knowledge when asked to classify

an unseen pattern

Machines Learning : Key features

Page 15: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay • The human mind is much better than a computer at recognizing vague/noisy

patterns -

• A well-trained computer can process larger amount of information!

• Non-linear model –same feature gets different weights in different combinations

MACHINE LEARNING

Page 16: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

16

• Downside – the computer will not tell you why it has classified a particular pattern in a particular way.

• A blackbox!!

• Like human mind!!

MACHINE LEARNING

Page 17: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Problems with Probabilistic/Fuzzy methods

• Weights of Evidence– Correlation between maps

• Fuzzy Logic– Subjective judgment -> difficult to reproduce

Page 18: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

MACHINE LEARNING

• Neural netwroks• Hybrid Neurofuzzy systems • Bayesian Classifier• Genetic Algorithms• SOM

Page 19: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

19

• Resource potential modeling can be viewed as a pattern recognition

problem.

• Involves predictive classification of each spatial unit characterized by a

unique combination of spatially coincident predictor patterns (or

unique conditions) as mineralized or barren with respect to the target

mineral deposit-type. In machine learning jargon, its called a feature

vector

1 1

1 2

2 2

3 4

3 3 3 4

4555

MACHINE LEARNING

??

??

??

??

??

1.250000 - 1<155

4.10004 - 5 15 - 304

3125003 - 430 - 453

2.50002 - 345 - 602

5100001 - 2>601

Distance from permeable struct

Soil permeability

Drainage density

SlopeUnique

Condition No

Predictor patterns Class –

Potential (1) or

not potential (0)

Page 20: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Attribute1(i), Attribute2(i), …………., Attribute6(i) 0

Attribute1(i), Attribute2(iii), …………., Attribute6(iv) 1

Attribute1(ii), Attribute2(i), …………., Attribute6(v) 1

Attribute1(v), Attribute2(i), …………., Attribute6(i) 0

Attribute1(iii), Attribute2(ii), …………., Attribute6(vi) 1

MACHINE LEARNING

Page 21: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

UNIQUE CONDITIONS GRID

Page 22: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

22

Converting GIS layers to feature vectors

targetoutput

1

Input feature vector

[3, 8, 33, 800]

GIS raster layers

SiO2 content

Rock type

Fe content

Distance to Fault

Deposits

11

0

0 00 0

0

00 0

0

1

00 0

0 0

0

0

Page 23: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Targeted Output (d) =1

Input Vector

40

1120

600

ActualOutput (y)

NN 0.36

Error = (d – y) 0.64

Feed forward

Backpropagation

Deposits

SiO2 content

MgO content

Fe content

Distance to Fault

Deposits

Page 24: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Inside the black-box………….???

A Layers of input neurons(Input layer -I)

A neuron (Nodes) (processing units)

A layer of Hidden neurons(Hidden layer -H)

A layer of output neurons(Output layer - O)

Neuron – Neural, but what is network?what is network? – Connect all neurons…..

w11w

12

w21w

22

w 31

w32w 41

w 42

x11

x 21

fi

fi

fi

fi

fh

fh

fo

Page 25: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology Bombay

An artificial neural network consists of a number of very simple processors, also called neurons, which are analogous to the biological neurons in the brain.

The neurons are connected by weighted links passing signals from one neuron to another.

The output signal is transmitted through the neuron’s outgoing connection. The outgoing connection splits into a number of branches that transmit the same signal. The outgoing branches terminate at the incoming connections of other neurons in the network.

Page 26: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Properties of architecture

• No connections within a layer• No direct connections between input and output layers• Fully connected between layers• Often more than 3 layers• Number of output units need not equal number of input units• Number of hidden units per layer can be more or less than input or

output units

Page 27: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

The neuron computes the weighted sum of the input signals and compares the result with a threshold value, . If the net input is less than the threshold, the neuron output is 0/–1. But if the net input is greater than or equal to the threshold, the neuron becomes activated and its output attains a value +1.

The neuron uses the following transfer or activation function:

n

iiiwxX

1

X

XY

ge

XfY

if ,1/0

if ,1

..

)(

Neuron functions (Also called Activation functions)

Page 28: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Activation functions of a neuron

S t e p f u n c t io n S ig n f u n c t io n

+ 1

-1

0

+ 1

-1

0X

Y

X

Y

+ 1

-1

0 X

Y

S ig m o id f u n c t io n

+ 1

-1

0 X

Y

L in e a r f u n c t io n

0 if ,0

0 if ,1

X

XY step

0 if ,1

0 if ,1

X

XY sign

Xsigmoid

eY

1

1XY linear

2

22

)(

cX

RBF eY

Radial basis function

Page 29: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Σ..

∫p1

p2

pn

w11

w1ni

n

iii bxwu

11

)(ufz

f – activation function

t

Network output

yTarget output

Error

yterror

Σ – transfer function

INPUT LAYER

HIDDEN LAYER

OUTPUTLAYER

w21

w2n

.

.)(vfy

Σ

j

n

iii bzwv

12

Page 30: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

NETWORK PARAMETERS

• Weights• Number of neurons• Function parameters

NETWORK TRAINING

Iterative modifications of network parameters to minimize error

TRAINING SAMPLES (VALIDATION SAMPLE)- Feature vectors whose class is known

Page 31: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

TRAINING ALGORITHM

• Problem of assigning ‘credit’ or ‘blame’ to individual elements

involved in forming overall response of a learning system

(hidden units)

• In neural networks, problem relates to deciding which weights

should be altered, by how much and in which direction.

Analogous to deciding how much a weight in an early layer contributes to the output and thus the error

We therefore want to find out how weight wij affects the error i.e. we want:

)(

)(

tw

tE

ij

Page 32: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Backpropagation learning algorithm ‘BP’

( Rumelhart, Hinton and Williams ,1986)

BP has two phases:

Forward pass phase: computes ‘functional signal’, feedforward propagation of input pattern signals through network

Backward pass phase: computes ‘error signal’, propagates the error backwards through network starting at output units (where the error is the difference between actual and desired output values)

Page 33: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Uses gradient descent (steepest descent) and Delta Rule for minimizing error

Backpropagation learning algorithm ‘BP’

( Rumelhart, Hinton and Williams ,1986)

Any given combination of weights will be associated with a particular error measure. The Delta Rule uses gradient descent learning to iteratively change network weights to minimize error (i.e., to locate the global minimum in the error surface).

Page 34: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

0

To find a minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a maximum of that function; the procedure is then known as gradient ascent.

Backpropagation learning algorithm ‘BP’

( Rumelhart, Hinton and Williams ,1986)

Page 35: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Too small steps, slow convergence of error, but convergence to minima assuredToo big steps, fast convergence but minima may be missed

Backpropagation learning algorithm ‘BP’

( Rumelhart, Hinton and Williams ,1986)Step size: Learning rate

Page 36: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Derivative: How a function changes as its input changesOr how much one quantity changes in response to a change in some other quantityfor example, the derivative of the position of a moving object with respect to time is the object's instantaneous velocity.≈ Slope/gradient

Backpropagation learning algorithm ‘BP’

( Rumelhart, Hinton and Williams ,1986)

Black: the graph of a functionRed: tangent line to that functionThe slope/gradient of the tangent line is equal to the derivative of the function at the marked point.

Black: Maximum ValueWhite: Minimum valueGradient points to wards higher values

Page 37: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Partial derivative: Suppose a function has several variables. Partial derivative of the function with respect to one of the variables is how the function changes as that variable changes (other variables assumed constant)

Backpropagation learning algorithm ‘BP’

( Rumelhart, Hinton and Williams ,1986)

Page 38: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Backpropagation learning algorithm ‘BP’

( Rumelhart, Hinton and Williams ,1986)

In the context of Neural networks - Function: ErrorVariables: weights/function parametersConceptual basis of weight adjustment:1. Determine partial derivative of error with respect to each of

the weights/parameters2. Adjust each weight in a direction opposite to the steepest

gradient

Page 39: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Input feature vector

X

Input layer

I

Hidden layer

J

Output layer

K

X1

X2

X3

X4

KwJwIx

yGenericall

lkkji

barren) if 0 bearing, resource if,1( arg

1

1 K ofOutput

K Input to

1

1 J ofOutput

J Input to

I ofOutput

I Input to

K

K

k

J

TetT

eO

bOwI

eO

bXwI

X

X

K

KJ

J

I

KJJ

I

JIJI

Page 40: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

1. Calculate errors of output neurons:

δK = OK (1 - OK) (Target - OK)

2. Change output layer weights

WJ_K= W J_K + η*δK *OJ

3. Calculate (back-propagate) hidden layer errors

δJ = OJ (1 – OJ) (δK *WJ_K )

4. Change hidden layer weights

WI1_J = WI1_J + η*δJ*x1

WI2_J = WI2_J + η*δJ*x2

Backpropagation learning algorithm ‘BP’

I1

I J K

Input layer

Hidden layer

Output layer

WI1_J

WI2_J

WJ_K

I2

X1

X2

The constant η (called the learning rate, and nominally equal to one) is put in to speed up or slow down the learning if required.

Page 41: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

input1 60

Input2 25

Input3 120

Input4 5

Data

2 hidden neurons1 output neuronLearning rate 0.5Sigma functionStart with random weights between 0 and 1,Run the algorithm. See if the error is reduced in the next iteration.

Page 42: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Practical considerations: Neural Network training• Collect all possible examples of the pattern

• Encode and format the data

• Classify in three subset:

• Training set (70%)

• Validation(20%)

• Testing set (10%)

• Or use n-fold (k-fold) validation (also called jack-knifing)

• GOLDEN RULE : Number of training set samples should be at least 3 times the number of parameters to be estimated –

Page 43: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Practical considerations: Neural Network training:

Input data encoding and formatting

VALUE COUNTAREA SQKM Rock type

Distance to Fault (km) Soil type

Slope (Degree) Resource

1 62487 62487 4 1 1 10 12 446 446 3 2 1 11 13 383 383 3 1 3 10 14 91831 91831 3 1 2 12 05 2892 2892 2 2 2 14 06 1227 1227 3 3 3 14 17 934 934 1 4 1 11 08 102 102 2 2 1 9 19 601 601 1 1 2 9 0

10 2742 2742 2 7 3 9 111 2320 2320 1 7 2 8 112 289 289 2 7 1 8 013 1 1 3 9 1 6 014 21050 21050 1 10 2 6 115 2984 2984 4 2 1 8 116 69 69 3 2 1 9 117 174 174 2 2 2 7 018 21 21 1 2 1 6 019 379 379 1 3 3 10 020 23 23 1 4 2 11 0

Rock type Soil type1Granite 1Sandy2Sandstone 2Clayey3Shale 3Silty4Basalt

Page 44: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Practical considerations: Neural Network training:

Input data encoding and formatting

VALUE COUNTAREA SQKM

Rock type

Distance to Fault

(km)

Soil type

Slope (Degree) ResourceGranite SSt Shale Basalt Sandy Clayey Silty

1 62487 62487 0 0 0 1 1 1 0 0 10 12 446 446 0 0 1 0 2 1 0 0 11 13 383 383 0 0 1 0 1 0 0 1 10 14 91831 91831 0 0 1 0 1 0 1 0 12 05 2892 2892 0 1 0 0 2 0 1 0 14 06 1227 1227 0 0 1 0 3 0 0 1 14 17 934 934 1 0 0 0 4 1 0 0 11 08 102 102 0 1 0 0 2 1 0 0 9 19 601 601 1 0 0 0 1 0 1 0 9 0

10 2742 2742 0 1 0 0 7 0 0 1 9 111 2320 2320 1 0 0 0 7 0 1 0 8 112 289 289 0 1 0 0 7 1 0 0 8 013 1 1 0 0 1 0 9 1 0 0 6 014 21050 21050 1 0 0 0 10 0 1 0 6 115 2984 2984 0 0 0 1 2 1 0 0 8 116 69 69 0 0 1 0 2 1 0 0 9 117 174 174 0 1 0 0 2 0 1 0 7 018 21 21 1 0 0 0 2 1 0 0 6 019 379 379 1 0 0 0 3 0 0 1 10 020 23 23 1 0 0 0 4 0 1 0 11 0

Page 45: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Practical considerations: Neural Network training:

Input data encoding and formatting

0 0 0 1 1 1 0 0 10 10 0 1 0 2 1 0 0 11 10 0 1 0 1 0 0 1 10 10 0 1 0 1 0 1 0 12 00 1 0 0 2 0 1 0 14 00 0 1 0 3 0 0 1 14 11 0 0 0 4 1 0 0 11 00 1 0 0 2 1 0 0 9 11 0 0 0 1 0 1 0 9 00 1 0 0 7 0 0 1 9 11 0 0 0 7 0 1 0 8 10 1 0 0 7 1 0 0 8 00 0 1 0 9 1 0 0 6 01 0 0 0 10 0 1 0 6 10 0 0 1 2 1 0 0 8 10 0 1 0 2 1 0 0 9 10 1 0 0 2 0 1 0 7 01 0 0 0 2 1 0 0 6 01 0 0 0 3 0 0 1 10 01 0 0 0 4 0 1 0 11 0

Page 46: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Practical considerations: Neural Network training:

Input data encoding and formatting

0 0 0 1 1 1 0 0 10 10 0 1 0 2 1 0 0 11 10 0 1 0 1 0 0 1 10 10 0 1 0 1 0 1 0 12 00 1 0 0 2 0 1 0 14 00 0 1 0 3 0 0 1 14 11 0 0 0 4 1 0 0 11 00 1 0 0 2 1 0 0 9 11 0 0 0 1 0 1 0 9 00 1 0 0 7 0 0 1 9 1

1 0 0 0 7 0 1 0 80 1 0 0 7 1 0 0 80 0 1 0 9 1 0 0 61 0 0 0 10 0 1 0 60 0 0 1 2 1 0 0 80 0 1 0 2 1 0 0 9

0 1 0 0 2 0 1 0 71 0 0 0 2 1 0 0 61 0 0 0 3 0 0 1 101 0 0 0 4 0 1 0 11

Training data

Validation data

Testing data

100111

0000

Page 47: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Practical considerations: Neural Network training:

Training

1. Chose a subset of training samples2. Computer the error for the subset3. Update weights so as to reduce the error (e.g., using gradient descent)4. Calculate error for validation samples

The above 4 steps comprise one pass through the subset of training samples along with an updating of weights, called a “training epoch” Number of training samples in the subset is epoch size.You can use an epoch size of 1, or an epoch size of n (=number of training samples), or any size between 1 and n.

Save the weights/parameters after every training epoch

Page 48: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Practical considerations: Neural Network training:

Training

Plot training and validation errors against number of training epochs

Validation error minimizes at 70 epochs, beyond which it begins to rise

=> The weights/parameters saved after 70th epoch comprise the trained network

Page 49: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

Practical considerations: Neural Network training:

Training

Before jumping to processing the samples to be classified, test your trained network with the testing samples (the third subset)

Page 50: Indian Institute of Technology Bombay MACHINE LEARNING.

Indian Institute of Technology BombayIndian Institute of Technology Bombay

05

10152025303540

2 4 6 8 10

Number of hidden units

Per

cen

t E

rro

rValidation SetError

Training SetError

Optimization of the number of hidden neurons