2806 Neural Computation Self-Organizing Maps Lecture 9 2005 Ari Visa.

2806 Neural ComputationSelf-Organizing Maps

Lecture 9

2005 Ari Visa

Agenda

Some historical notes Some theory Self-Organizing Map Learning Vector Quantization Conclusions

Some Historical Notes

Local ordering (von der Malsbyrg, 1973)(Amari, 1980) a matematical analysis elucidates the

dynamic stability of a cortical mapSelf-organizing feature map (SOM), (Kohonen

1982)(Erwin, 1992) a convex neighbourhood function

should be used (~ Gaussian)The relationship between the SOM and principal

curves is dicussed (Ritter, 1992 & Cherkassky and Mulier, 1995)

Some Historical Notes

Vector quantization: Lloyd algorithm 1957 for scalar quantization (= Max quantizer )

The generilized Lloyd algorithm for vector quantization (= k-means algorithm McQueen 1969 = LBG algorithm Linde et al 1980)

The idea of learning vector quantization (LVQ) (Kohonen, 1986)

Convergence properties of the LVQ algorithm using the ordinary differential equation (ODE) (Baras and LaVigna, 1990)

Some Theory

The spatial location of an output neuron in a topographic map corresponds to a particular domain or feature of data drawn from the input space. (Kohonen 1990)

Two approaches: Willshaw-von der Malsburg (explain neurobiological details) and Kohonen ( a more general than the first model in the sense that it is capable of performing data reduction).

Some Theory

The principle goal of the self-organizing map is to transform an incoming signal pattern of arbitary dimension into a one- or two dimensional discrete map, and to perform this transformation adaptively in a topologically ordered fashion.

SOM

The formation of the self-organizing map

1. Competition2. Cooperation3. Synaptic Adaptation

Competitive Process x = [x1 , x2 , ... ,xm ]T

wj = [wj1,wj2 , ... ,wjm]T

where j = 1,2,...,l (=total number of neurons in the network)

Select the neuron with largest wjTx

i(x) = arg minj||x – wj || A continuous input space of

activation patterns is mapped onto a discrete output space of neurons by a process of competition among the neurons in the network .

SOM

Cooperation How to define a topological

neigborhood that is neurobiologically correct?

Let hij denote the topological neighborhood centered on winning neuron i, and encompassing a set of excited neurons denoted by j.

The topological neighborhood is symmetric about the maximum point

The amplitude of the topological neighborhood decreases monotonically with the increasing lateral distance hj,i(x) (n) = exp(d²j,i/2²(n))

(n) = 0exp(-n/1), n =0,1,2,… .

SOM

Adaptation wj(n+1) = wj(n) + (n) hj,i(x)(n)[x-wj(n)], note

Hebbian learning The synaptic weight vector wj of winning neuron

i move toward the input vector x. Upon repeated presentations of the training data, the synaptic weight vector tend to follow the distribution of the input vectors due to the neighborhood updating. topological ordering

The learning-rate parameter (n) should be time varying. (n) = 0exp(-n/2), n =0,1,2,…

SOM

Ordering and Convergence Self-organizing or ordering phase: (max

1000 iterations) (n) = [0.1, 0.01], 2 = 1000 hj,i(n) = [”radius” of the lattice, the

winning neuron and a couple neighboring neurons around], 1 = 1000/ log 0

Convergence phase: (fine tune the feature map, 500*the number of neurons in the network)

(n) = 0.01, hj,i(n) = the winning neuron and one or zero neighboring neurons. (summary of SOM)

Some Theory

Property 1. Approximation of the Input Space. The feature map , represented by the set of synaptic weight vectors {wj} in the input space A, provides a good approximation to the input space H.

The theoretical basis of the idea is rooted in vector quantization theory.

Some Theory

c(x) acts as an encoder of the input vector x and x’(c) acts as a decoder of c(x). The vector x is selected at random from the training sample, subject to an underlying probability density function fx(x). The optimum encoding-decoding scheme is detemined by varying the functions c(x) and x’(c), so as to minimize the expected distortion defined by .

Some Theory

d(x,x’) = ||x-x’||²= (x-x’)T(x-x’)

Generalized Lloyd algorithmCondition 1. Given the input

vector x, choose the code c = c(x) to minimize the squared error distortion ||x-x’(c)||².

Condition 2. Given the code c, compute the reconstuction vector x =x’(c) as the centroid of the input vectors x that satisfy condition 1.

The Generalized Lloyd algorithm is closely related to the SOM.

Some Theory

Condition 1. Given the input vector x, choose the code c = c(x) to minimize the distortion measure D2 .

Condition 2. Given the code c, compute the reconstuction vector x =x’(c) to satisfy the conditions.

x’new(c) ← x’old(c) + (c-c(x))[ x-x’old(c)]

Some Theory

Property 2. Topological Ordering. The feature map computed by the SOM algorithm is topologically ordered in the sense that the spatial location of a neuron in the lattice corresponds to a particular domain or feature of input patterns.

Some Theory

Property 3. Density Matching. The feature map reflects variations in the statistics of the input distributions: regions in the input space H from which the sample vectors x are drawn with a high probability of occurrence are mapped onto larger domains of the output space A, and therefore with better resolution than regions in H from which sample vectors x are drawn with a low probability of occurrence.

Minimum-distortion encoding, according to which the curvature terms and all higher-order terms in the distortion measure due to noise model () are retained. m(x) fx

⅓(x) Nearest-neighbor encoding, which emerges if the curvature terms are

ignored, as in the standard form of the SOM algorithm. m(x) fx ⅔(x)

Some Theory

.

Property 4. Feature Selection. Given data from an input space with an nonlinear distribution, the self-organizing map is able to select a set of best features for approximating the underlying distribution.

Learning Vector Quantizer

Vector quantization: an input space is divided into a number of distinct regions, and for each region a reconstruction vector is defined.

A vector quantizer with minimum encoding distortion is called a Voronoi or nearest-neighbor quantizer.

The collection of possible reproduction vectors is called the code book of the quantizer, and its members are called code vectors.


The SOM algorithm provides an approximate method for computing the Voronoi vectors in unsupervised manner.

Learning vector quantization (LVQ) is a supervised learning technique that uses class information to move the Voronoi vectors slightly, so as to improve the quality of the classifier decision regions.


An input vector x is picked at random from the input space. If the class labels of the input vector x and a Voronoi vector w agree, the Voronoi vector w is moved in the direction of the input vector x. If the class labels of the input vector x and the Voronoi vector w disagree, the Voronoi vector w is moved away from the input vector x.

Let {wi}1i=1 denote the set of

Voronoi vectors, and the {xi}Ni=1

denote the set of input vectors.

LVQ:I. Suppose that the Voronoi vector

wc is the closest to the input vector xi . Let Lwc denote the class associated with the Voronoi vector wc and Lxi denote the class label of the input vector xi. The Voronoi vector wc is adjusted as follows:

If Lwc = Lxi ,then wc(n+1) = wc(n) + αn[xi - wc(n)] where 0< αn<1.

If Lwc ≠ Lxi ,then wc(n+1) = wc(n) - αn[xi - wc(n)] where 0< αn<1.

II. The other Voronoi vectors are not modified.


Vector quantization is a form of lossy data compression.Rate distortion theory (Gray 1984): Better data compression performance

can always be achieved by coding vectors instead of scalars, even if the source of data is memoryless, or if the data compression system has memory.

A multistage hierarchical vector quantizer (Luttrell 1989)


First-order autoregressive (AR) model: x(n+1) = x(n) +(n) where is the AR coefficient and the (n) are independent and identical distributed (iid) Gaussian random variables of zero mean and unit variance.

Some Theory

Attribute code xa

Symbol code xs

x =[xs, 0]T + [0, xa]T

Contextual map or semantic map

Summary

The SOM algorithm is neurobiologically inspired, incorporating all the mechanisms that are basic to self-organization: competition, cooperation, and self-amplification.

The Kohonen’s SOM algorithm is so simple to implement, yet matematically so difficult to analyze its properties in a general setting.

The self-organizing map may be viewed as a vector quantizer.

2806 Neural Computation Self-Organizing Maps Lecture 9 2005 Ari Visa.

Documents

Transcript of 2806 Neural Computation Self-Organizing Maps Lecture 9 2005 Ari Visa.