DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

20
DATA-MINING DATA-MINING Artificial Neural Artificial Neural Networks Networks Alexey Minin, Jass 2006 Alexey Minin, Jass 2006

Transcript of DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Page 1: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

DATA-MININGDATA-MINING

Artificial Neural NetworksArtificial Neural Networks

Alexey Minin, Jass 2006Alexey Minin, Jass 2006

Page 2: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

ANN forms it’s output itself, according to the information, presented for input. We have to minimize some functional.After we have found this functional we have to minimize it. It is the main task, and according to this functional the input vector will be changed.

In practice, adaptive networks code input information in the most compact way, of course according to some predefined requirements.

Teaching without the tutorTeaching without the tutor: : introductionintroduction

Page 3: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Reducing the dimension of datawith min loss

Teaching without the tutorTeaching without the tutor: : redundancy of redundancy of datadata

The length of data descriptionThe length of data description:: D d bDimension of data = number of components of input vector

d

b Capacity of data = number of bits, defining the possible variety of all values

x

Two ways of coding (reducing) the information

Reducing the variety of data by detecting the prototypes

finding of independentfeatures

Clustering and quantifying

Page 4: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Two ways to reduce the dataTwo ways to reduce the data

xReducing the dimension allows us to describe the data with less components

Clustering allows us to reduce the variety of data,reducing the number of bits,we need to describe the data.

We can unite both types of algorithms. We can use Kohonen maps, when prototypes regulate in the space of low dimension. For example, input data can be reflected on to 2-dimensional grid of prototypes the way, you can visualize the data you have.

NB

x

Page 5: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Main idea: neuron - indicatorMain idea: neuron - indicator

x

y w xj j

d w x

1

Neuron has one output and it’s teaching upon a d-dimension data

Lets say that the activation function is linear. The output therefore is the linear combination of it’s outputs:

x1 xd

y w xj jj

d

1

The amplitude after the training is finished can be the indicator for the data. Showing rather the data corresponds for training patterns or not.

Page 6: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Hebb training algorithmHebb training algorithm

w y xj j According to Hebb:

If we will reformulate the task as the optimization task we will get the property of such neuron and rule how to define functional we have to min:

ww

w x w x

EE y, , 1

22 1

22

NB! If we wont to have minimum of the E than we will have an output amplitude equals to infinity

Page 7: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Oja training ruleOja training rule

x

w

1

The member interfering was added to stop unlimited growth of weights

Rule Oja maximizes sensitivity of an output neuron at the limited amplitude of weights. It is easy to be convinced of it, having equated average change of weights to zero. Having increased then the right part of equality on w. We are convinced, that in balance

Thus, weights of trained neuron are located on hyper sphere:

At training on Oja, a vector of weights settles down on hyper sphere, In a direction maximizing Projection of input vectors.

j j jw y x y w

2 1 0y 2w

1.w

Page 8: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

SUMMARY: Neuron is trying to reproduce the value of it’s input for known output. It means that it’s trying to maximize the sensitivity of it’s output neurons-indicators for many dimensional input information, doing compression this way.

Oja training ruleOja training rule

y w y wk kjk k ijk

i 1

NB! The output of the Oja output layer is the NB! The output of the Oja output layer is the linear combination of main components. If you linear combination of main components. If you want to receive main components you should want to receive main components you should

change sum of all outputs:change sum of all outputs:

Page 9: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

The analysis of main componentsThe analysis of main components

x

y w x w xi ij jj

d

ij j ij

d

1 1w x

i m1,...,

Lets say that we have d-dimensional data

we are training m linear neurons:

.

x1 xd

y w xi ij jj

d

1

We want an amplitude to be independent indicators of all output neurons, fully reflecting information about many-dimensional data we have.

THE TASK IS:

Page 10: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

The requirement:The requirement: Neurons must interact somehow (if we will train them Neurons must interact somehow (if we will train them

independently we will receive the same result for all of them)independently we will receive the same result for all of them)

In simple case:In simple case:

Lets take perceptron with linear neuron for hidden layer, in which the number of inputs and outputs equals, and the weights with the same indexes in both layers are the same. Lets try to teach ANN to reproduce the input on the output. Training rule therefore:

w x x x w y y y~

x xd1 ...

~ ... ~x xd1

Looks like Oya training rule!Looks like Oya training rule!

Page 11: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Self training layer:Self training layer:In our formulation the training of separate neuron, is trying to reproduce the inputs according to its outputs. Generalizing this note, it is logical to suggest a rule,according to which the value of outputs restoring according to whole output information. Doing this way we can get Oja training rule for one layer network:

w y x x y x y wij i j j i j k kjk

~

x xd1 ...

~ ... ~x xd1The hidden layer of such ANN, the same as Oya layer,makes optimal coding of input data, and contains maximum variety of data according to existing restrictions.

Page 12: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Example:Example:Lets change activation function on the sigmoid in the training rule:

w x wi i k kkf y f y

Brings new property (Oja, et al, 1991). Such algorithm, in particular,was used for the decomposition of mixed signals with an unknown way (i.e. blind signal separation).

For example this task we have when we want to separate human voice and noise.

Page 13: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Competition of neuronsCompetition of neurons: : the winner gets allthe winner gets all

# : iii i w x w x

x1 xd

y w xi ij jj

d

1

i i k kky y w x w

Basis algorithmThe training of competition layer remains constant:

The winner:

1i iiif i i

w w x w x

i # of neuron winner

The winner will be the neuron, which has the maximum response

Training of winner:

i i

w x w 1, 0,iiy y i i

Page 14: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

The winner takes away not allThe winner takes away not all

One of variants of updating of a base rule of training of a competitive layer Consists in training not only the neuron-winner, but also its "neighbors", though and with In the smaller speed. Such approach - "pulling up" of the nearest to the winner neuron- It is applied in topographical Kohonen cards

# : min iii

i i

w x w x

( 1) ( 1) ( ) , ( ) ( )t t t t t t i i i iw w w i i x w , t i i

Function of the neighborhood is equal to unit for the neuron--winner with an index And gradually falls down at removal from the neuron-winner i

Training on Kohonen reminds stretching an elastic grid of prototypes on Data file from training sample

a a exp 2 2

Page 15: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Schematic representation of self-organizing network

Methodology Methodology of self-organizingof self-organizing cards cards

Neurons in the target layer are ordered and correspond to cells of a bi-dimensional card which can be painted by a principle of affinity of attributes

Training on Kohonen reminds stretching an elastic grid of prototypes on Data file from training sample

Page 16: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

xi The convenient tool of visualization Data is coloring topographical Cards, it is similar to how it do on Usual geographical cards. Allattribute of data generates the coloring Cells of a card - on size of average value This attribute at the data who have got in given Cell.

Visualization a topographical card, Induced by i-th Visualization a topographical card, Induced by i-th component of entrance datacomponent of entrance data

Having collected together cards of all interesting Us of attributes, we shall receive topographical The atlas, giving integrated representation About structure of multivariate data.

Page 17: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Classified SOM for NASDAQ100 index for the period from 10-Nov-1997 till 27-Aug-2001

Methodology Methodology of self-organizingof self-organizing cards cards

Page 18: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Complexity of the algorithmComplexity of the algorithm

When it’s better to use reducing of dimension, and when – quantifying of the inputinformation?

Reducing the dim2

1 ~C PW

Number of training patterns

# of operations:

quantifying

W dm

K d m

4 21C Pd K

number of syn weights of 1 layer ANNwith d inputs & m output neurons

Compression coef: 2logK db mCompression coef (b – capacity data)

# of operations: 2 ~C PW

2 2db KC PdComplexityComplexity:: ComplexityComplexity::

P

22

31

2~

db KC K

C dWith the same compression coef:

Page 19: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

JPEG exampleJPEG example

d 8 8 642 2568

b 8d b2 1

Image is divided on to 8x8 pixels, which should be input vectors, we want to reduce. In our case

Lets propose that image contains gradation of the gray accuracy of the represented data

But if d=64x64 than K>103

Page 20: DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Any questions?Any questions?