DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

DATA-MININGDATA-MINING

Artificial Neural NetworksArtificial Neural Networks

Alexey Minin, Jass 2006Alexey Minin, Jass 2006

ANN forms it’s output itself, according to the information, presented for input. We have to minimize some functional.After we have found this functional we have to minimize it. It is the main task, and according to this functional the input vector will be changed.

In practice, adaptive networks code input information in the most compact way, of course according to some predefined requirements.

Teaching without the tutorTeaching without the tutor: : introductionintroduction

Reducing the dimension of datawith min loss

Teaching without the tutorTeaching without the tutor: : redundancy of redundancy of datadata

The length of data descriptionThe length of data description:: D d bDimension of data = number of components of input vector

d

b Capacity of data = number of bits, defining the possible variety of all values

x

Two ways of coding (reducing) the information

Reducing the variety of data by detecting the prototypes

finding of independentfeatures

Clustering and quantifying

Two ways to reduce the dataTwo ways to reduce the data

xReducing the dimension allows us to describe the data with less components

Clustering allows us to reduce the variety of data,reducing the number of bits,we need to describe the data.

We can unite both types of algorithms. We can use Kohonen maps, when prototypes regulate in the space of low dimension. For example, input data can be reflected on to 2-dimensional grid of prototypes the way, you can visualize the data you have.

NB

x

Main idea: neuron - indicatorMain idea: neuron - indicator

x

y w xj j

d w x

1

Neuron has one output and it’s teaching upon a d-dimension data

Lets say that the activation function is linear. The output therefore is the linear combination of it’s outputs:

x1 xd

y w xj jj

d

1

The amplitude after the training is finished can be the indicator for the data. Showing rather the data corresponds for training patterns or not.

Hebb training algorithmHebb training algorithm

w y xj j According to Hebb:

If we will reformulate the task as the optimization task we will get the property of such neuron and rule how to define functional we have to min:

ww

w x w x

EE y, , 1

22 1

22

NB! If we wont to have minimum of the E than we will have an output amplitude equals to infinity

Oja training ruleOja training rule

x

w

1

The member interfering was added to stop unlimited growth of weights

Rule Oja maximizes sensitivity of an output neuron at the limited amplitude of weights. It is easy to be convinced of it, having equated average change of weights to zero. Having increased then the right part of equality on w. We are convinced, that in balance

Thus, weights of trained neuron are located on hyper sphere:

At training on Oja, a vector of weights settles down on hyper sphere, In a direction maximizing Projection of input vectors.

j j jw y x y w

2 1 0y 2w

1.w

SUMMARY: Neuron is trying to reproduce the value of it’s input for known output. It means that it’s trying to maximize the sensitivity of it’s output neurons-indicators for many dimensional input information, doing compression this way.

Oja training ruleOja training rule

y w y wk kjk k ijk

i 1

NB! The output of the Oja output layer is the NB! The output of the Oja output layer is the linear combination of main components. If you linear combination of main components. If you want to receive main components you should want to receive main components you should

change sum of all outputs:change sum of all outputs:

The analysis of main componentsThe analysis of main components

x

y w x w xi ij jj

d

ij j ij

d

1 1w x

i m1,...,

Lets say that we have d-dimensional data

we are training m linear neurons:

.

x1 xd

y w xi ij jj

d

1

We want an amplitude to be independent indicators of all output neurons, fully reflecting information about many-dimensional data we have.

THE TASK IS:

The requirement:The requirement: Neurons must interact somehow (if we will train them Neurons must interact somehow (if we will train them

independently we will receive the same result for all of them)independently we will receive the same result for all of them)

In simple case:In simple case:

Lets take perceptron with linear neuron for hidden layer, in which the number of inputs and outputs equals, and the weights with the same indexes in both layers are the same. Lets try to teach ANN to reproduce the input on the output. Training rule therefore:

w x x x w y y y~

x xd1 ...

~ ... ~x xd1

Looks like Oya training rule!Looks like Oya training rule!

Self training layer:Self training layer:In our formulation the training of separate neuron, is trying to reproduce the inputs according to its outputs. Generalizing this note, it is logical to suggest a rule,according to which the value of outputs restoring according to whole output information. Doing this way we can get Oja training rule for one layer network:

w y x x y x y wij i j j i j k kjk

~

x xd1 ...

~ ... ~x xd1The hidden layer of such ANN, the same as Oya layer,makes optimal coding of input data, and contains maximum variety of data according to existing restrictions.

Example:Example:Lets change activation function on the sigmoid in the training rule:

w x wi i k kkf y f y

Brings new property (Oja, et al, 1991). Such algorithm, in particular,was used for the decomposition of mixed signals with an unknown way (i.e. blind signal separation).

For example this task we have when we want to separate human voice and noise.

Competition of neuronsCompetition of neurons: : the winner gets allthe winner gets all

# : iii i w x w x

x1 xd

y w xi ij jj

d

1

i i k kky y w x w

Basis algorithmThe training of competition layer remains constant:

The winner:

1i iiif i i

w w x w x

i # of neuron winner

The winner will be the neuron, which has the maximum response

Training of winner:

i i

w x w 1, 0,iiy y i i

The winner takes away not allThe winner takes away not all

One of variants of updating of a base rule of training of a competitive layer Consists in training not only the neuron-winner, but also its "neighbors", though and with In the smaller speed. Such approach - "pulling up" of the nearest to the winner neuron- It is applied in topographical Kohonen cards

# : min iii

i i

w x w x

( 1) ( 1) ( ) , ( ) ( )t t t t t t i i i iw w w i i x w , t i i

Function of the neighborhood is equal to unit for the neuron--winner with an index And gradually falls down at removal from the neuron-winner i

Training on Kohonen reminds stretching an elastic grid of prototypes on Data file from training sample

a a exp 2 2

Schematic representation of self-organizing network

Methodology Methodology of self-organizingof self-organizing cards cards

Neurons in the target layer are ordered and correspond to cells of a bi-dimensional card which can be painted by a principle of affinity of attributes

Training on Kohonen reminds stretching an elastic grid of prototypes on Data file from training sample

xi The convenient tool of visualization Data is coloring topographical Cards, it is similar to how it do on Usual geographical cards. Allattribute of data generates the coloring Cells of a card - on size of average value This attribute at the data who have got in given Cell.

Visualization a topographical card, Induced by i-th Visualization a topographical card, Induced by i-th component of entrance datacomponent of entrance data

Having collected together cards of all interesting Us of attributes, we shall receive topographical The atlas, giving integrated representation About structure of multivariate data.

Classified SOM for NASDAQ100 index for the period from 10-Nov-1997 till 27-Aug-2001

Methodology Methodology of self-organizingof self-organizing cards cards

Complexity of the algorithmComplexity of the algorithm

When it’s better to use reducing of dimension, and when – quantifying of the inputinformation?

Reducing the dim2

1 ~C PW

Number of training patterns

# of operations:

quantifying

W dm

K d m

4 21C Pd K

number of syn weights of 1 layer ANNwith d inputs & m output neurons

Compression coef: 2logK db mCompression coef (b – capacity data)

# of operations: 2 ~C PW

2 2db KC PdComplexityComplexity:: ComplexityComplexity::

P

22

31

2~

db KC K

C dWith the same compression coef:

JPEG exampleJPEG example

d 8 8 642 2568

b 8d b2 1

Image is divided on to 8x8 pixels, which should be input vectors, we want to reduce. In our case

Lets propose that image contains gradation of the gray accuracy of the represented data

But if d=64x64 than K>103

Any questions?Any questions?

DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.

Documents

Transcript of DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.